Re: [PATCH RFC v3 1/6] exterr: Introduce extended syscall error reporting

From: Jonathan Corbet
Date: Mon Sep 14 2015 - 16:19:44 EST


On Fri, 11 Sep 2015 19:00:00 +0300
Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx> wrote:

> It has been pointed out several times that certain system calls' error
> reporting leaves a lot to be desired [1], [2]. Such system calls would
> take complex parameter structures as their input and return -EINVAL if
> one or more parameters are invalid or in conflict leaving it up to the
> user to figure out exactly what is wrong with their request. One such
> syscall is perf_event_open() with its attribute structure containing
> 40+ parameters and tens of parameter validation checks.
>
> This patch introduces a fairly simple infrastructure that allows call
> sites to annotate their error codes with arbitrary strings, which the
> userspace can fetch using a prctl() along with the module name that
> produced the error message, file name, line number and optionally any
> amount of additional information in JSON format.

So, a couple of comments as I try to figure this stuff out...

> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 119823decc..0eeeda62ef 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1776,6 +1776,7 @@ struct task_struct {
> #ifdef CONFIG_DEBUG_ATOMIC_SLEEP
> unsigned long task_state_change;
> #endif
> + int ext_err_code;
> int pagefault_disabled;

Here you add this extended error code, fine. I'll come back to this in a
moment, but first...

[...]

> +static struct ext_err_site *ext_err_site_find(int code)
> +{
> + struct ext_err_domain_desc *dom = ext_err_domain_find(code);
> + struct ext_err_site *site;
> + unsigned long off;
> +
> + if (!code)
> + return NULL;

You've already used "code" by this point, and it looks like
ext_err_domain_find() will react by doing a full, doomed search for it.

> + /* shouldn't happen */
> + if (WARN_ON_ONCE(!dom))
> + return NULL;
> +
> + code -= dom->first;
> + off = (unsigned long)dom->start + dom->err_site_size * code;
> + site = (struct ext_err_site *)off;
> +
> + return site;
> +}
> +
> +int ext_err_copy_to_user(void __user *buf, size_t size)
> +{
> + struct ext_err_domain_desc *dom = ext_err_domain_find(current->ext_err_code);
> + struct ext_err_site *site = ext_err_site_find(current->ext_err_code);

Here too you use ext_err_code without making sure it has a useful value.
This will also result in two calls to ext_err_domain_find(), and, thus, a
redundant search of the domains array. Why not just pass dom to the second
call?

[...]

> + if (!ret)
> + current->ext_err_code = 0;

Back to this code: here is the only place you ever clear it, and this seems
to be an error response in its own right. So, if I read this correctly,
once an extended error has been signalled, it will remain forever in the
task state until another extended error overwrites it, right?

What that says to me is that there will be no way to know whether an error
description returned from prctl() corresponds to an error just reported to
the application or not; it could be an old error from last week. That
strikes me as being less useful than it could otherwise be.

It seems to me that current->ext_err_code needs to be cleared on each
system call entry (except for your special prctl() of course!).

Or have I missed something somewhere?

Thanks,

jon
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/