Re: Perf: event wakeup discards sched_waking events

From: Mathieu Desnoyers
Date: Mon Jan 14 2019 - 16:37:04 EST


----- On Jan 14, 2019, at 8:09 AM, Peter Zijlstra peterz@xxxxxxxxxxxxx wrote:

> On Thu, Jan 10, 2019 at 12:45:27PM -0500, Mathieu Desnoyers wrote:
>
>> static void perf_pending_event(struct irq_work *entry)
>> {
>> struct perf_event *event = container_of(entry,
>> struct perf_event, pending);
>> int rctx;
>>
>> rctx = perf_swevent_get_recursion_context();
>> /*
>> * If we 'fail' here, that's OK, it means recursion is already disabled
>> * and we won't recurse 'further'.
>> */
>>
>> if (event->pending_disable) {
>> event->pending_disable = 0;
>> perf_event_disable_local(event);
>> }
>>
>> if (event->pending_wakeup) {
>> event->pending_wakeup = 0;
>> perf_event_wakeup(event);
>> }
>>
>> if (rctx >= 0)
>> perf_swevent_put_recursion_context(rctx);
>> }
>>
>> One side-effect of perf_event_wakeup() is to generate a sched_waking
>> event. But I suspect it won't be traced by perf because it is invoked before
>> putting the recursion context.
>>
>> Is there a reason why the wakeup is done before putting the recursion
>> context ?
>
> d525211f9d1b ("perf: Fix irq_work 'tail' recursion")
>
> If we were to allow perf_event_wakeup() to generate its tracepoint, we'd
> be back to square #1, no?

Considering that perf tracing code has side-effects that generate additional
events, it's indeed best not to trace them, otherwise you indeed end up with
tail recursion.

Can ftrace end up in the same situation through rb_wake_up_waiters() ? I suspect
the tail recursion would be hard to trigger if the wakeup only happens once per
page though, unless the events generated end up filling up a page.

FWIW, LTTng avoids this entire issue by using a timer-based polling mechanism
to ensure the tracing code does not call into the scheduler wakeup.

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com