Re: [RFC] perf: Delayed userspace unwind (Was: [PATCH v3 00/10] x86: ORC unwinder)

From: Peter Zijlstra
Date: Sat Jul 29 2017 - 05:28:43 EST


On Fri, Jul 28, 2017 at 08:35:16PM -0700, Andy Lutomirski wrote:

> I haven't checked task_work specifically, but a bunch of the exit work
> is permitted to sleep, which is potentially useful.

Yes.

> If this becomes successful enough that we could eventually deprecate
> the old code, I wonder if copy_from_user_nmi() could go away? :)

So we still use that for things like the PEBS IP fixup for older CPUs.
That needs to read the userspace code.

Also, since all this is optional on userspace asking for the new format,
we will probably (forever) need to support userspace not asking for it.

> > + if (!work->func) {
> > + work->func = perf_callchain_work;
> > + /*
> > + * We cannot do set_notify_resume() from NMI context,
> > + * also, knowing we are already in an interrupted
> > + * context and will pass return to userspace, we can
> > + * simply set TIF_NOTIFY_RESUME.
> > + */
> > + task_work_add(current, work, false);
> > + set_tsk_thread_flag(current, TIF_NOTIFY_RESUME);
>
> There's a more or leas unavoidable window in which this won't be
> noticed, which could plausibly confuse userspace. It might be
> possible to figure out a way for an NMI to tell if it lands in this
> window, but it would be a bit tricky.

Correct, I have been thinking on how to do that but haven't found
anything particularly nice yet.

> Also, is the task_work code prepared to handle task_work_add during
> exit?

That is one I hadn't thought of, but basically task_work_add() will fail
if the task is too far gone. At that point we should fallback to the
'old' behaviour and simply include the information in the kernel SAMPLE
record.