Re: [PATCH v2 0/6] perf: Introduce extended syscall error reporting

From: Alexander Shishkin
Date: Wed Aug 26 2015 - 07:37:49 EST

Ingo Molnar <mingo@xxxxxxxxxx> writes:

> * Ingo Molnar <mingo@xxxxxxxxxx> wrote:
>> * Johannes Berg <johannes@xxxxxxxxxxxxxxxx> wrote:
>> > On Mon, 2015-08-24 at 17:32 +0300, Alexander Shishkin wrote:
>> >
>> > > This time around, I employed a linker trick to convert the structures
>> > > containing extended error information into integers, which are then made to
>> > > look just like normal error codes so that IS_ERR_VALUE() and friends would
>> > > still work correctly on them. So no extra pointers in the struct perf_event
>> > > or anywhere else; the extended error codes are passed around like normal
>> > > error codes. They only need to be converted in syscalls' topmost return
>> > > statements. This is done in 1/6.
>> >
>> > For the record, as we discussed separately, I'd love to see this move to more
>> > general infrastructure. In wireless (nl80211), for example, we have a few
>> > hundred (!) callsites returning -EINVAL, mostly based on malformed netlink
>> > attributes, and it can be very difficult to figure out what went wrong;
>> > debugging mostly employs a variation of Hugh's trick.
>> Absolutely, I suggested this as well earlier today, as the scheduler would like
>> to make use of it in syscalls with extensible ABIs, such as sched_setattr().
>> If people really like this then we could go farther as well and add a standalone
>> 'extended errors system call' as well (SyS_errno_extended_get()), which would
>> allow the recovery of error strings even for system calls that are not easily
>> extensible. We could cache the last error description in the task struct.
> If we do that then we don't even have to introduce per system call error code
> conversion, but could unconditionally save the last extended error info in the
> task struct and continue - this could be done very cheaply with the linker trick
> driven integer ID.
> I.e. system calls could opt in to do:
> return err_str(-EBUSY, "perf/x86: BTS conflicts with active events");
> and the overhead of this would be minimal, we'd essentially do something like this
> to save the error:
> current->err_code = code;
> where 'code' is a build time constant in essence.

I'd propose a mixed approach here: err_str() would still return an
integer in the [-EXT_ERRNO, -MAX_ERRNO] range which would index the
err_site struct and upon returning to userspace we'd do

current->err_code = code;
return ext_errno(code); /* the traditional errno */

Reason: the lifetime of this extended error code would be exactly the
same as that of the traditional error value so that we'd always return
the most recent error and wouldn't be prone to something overwriting the
error code under us.

The problem with code checking for different types of errors has two
sides to it:
* most of those error codes that are check for shouldn't really be
annotated at all and should rather remain like they are;
* with the ones that actually do need to be checked for, the checks
would change from "if (err == EINTR)" to "if (ext_errno(err) ==
EINTR)", which doesn't seem like a big deal (with ext_errno() being a
O(1) lookup).

Side note: we should also make sure that only the userspace-visible
errors ever get annotated like that to prevent the error message creep
(which would be even a bigger problem if we go ahead to store the
extended error code in task_struct right at the topmost return
statement). Perf example: pretty much all errors that happen around
event scheduling, including stuff that pmu callbacks return, needn't and
shouldn't be annotated at all.

> We could use this even in system calls where the error path is performance
> critical, as all the string recovery and copying overhead would be triggered by
> applications that opt in via the new system call:
> struct err_desc {
> const char *message;
> const char *owner;
> const int code;
> };
> SyS_err_get_desc(struct err_desc *err_desc __user);
> [ Which could perhaps be a prctl() extension as well (PR_GET_ERR_DESC): finally
> some truly matching functionality for prctl(). ]
> Hm?

I like this.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at