Re: [PATCH v3 2/4] perf: Enqueue SIGTRAP always via task_work.

From: Frederic Weisbecker
Date: Tue Apr 09 2024 - 08:37:04 EST


Le Tue, Apr 09, 2024 at 10:57:32AM +0200, Sebastian Andrzej Siewior a écrit :
> > > static void
> > > perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
> > > {
> > > @@ -13088,6 +13084,18 @@ perf_event_exit_event(struct perf_event *event, struct perf_event_context *ctx)
> > > * Kick perf_poll() for is_event_hup();
> > > */
> > > perf_event_wakeup(parent_event);
> > > + /*
> > > + * Cancel pending task_work and update counters if it has not
> > > + * yet been delivered to userland. free_event() expects the
> > > + * reference counter at one and keeping the event around until
> > > + * the task returns to userland can be a unexpected if there is
> > > + * no signal handler registered.
> > > + */
> > > + if (event->pending_work &&
> > > + task_work_cancel_match(current, task_work_cb_match, event)) {
> > > + put_event(event);
> > > + local_dec(&event->ctx->nr_pending);
> > > + }
> >
> > So exiting task, privileged exec and also exit on exec call into this before
> > releasing the children.
> >
> > And parents rely on put_event() from file close + the task work.
> >
> > But what about remote release of children on file close?
> > See perf_event_release_kernel() directly calling free_event() on them.
>
> Interesting things you are presenting. I had events popping up at random
> even after the task decided that it won't go back to userland to handle
> it so letting it free looked like the only option…
>
> > One possible fix is to avoid the reference count game around task work
> > and flush them on free_event().
> >
> > See here:
> >
> > https://lore.kernel.org/all/202403310406.TPrIela8-lkp@xxxxxxxxx/T/#m63c28147d8ac06b21c64d7784d49f892e06c0e50
>
> That wake_up() within preempt_disable() section breaks on RT.

Ah, but the wake-up still wants to go inside recursion protection somehow or
it could generate task_work loop again due to tracepoint events...

Although... the wake up occurs only when the event is dead after all...

> How do we go on from here?

I'd tend to think you need my patchset first because the problems it
fixes were not easily visible as long as there was an irq work to take
care of things most of the time. But once you rely on task_work only then
these become a real problem. Especially the sync against perf_release().

Thanks.

>
> > Thanks.
>
> Sebastian