Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work.

From: Oleg Nesterov
Date: Wed Dec 04 2024 - 08:49:16 EST

Next message: Maarten Lankhorst: "[PATCH v2 3/7] drm/ttm: Handle cgroup based eviction in TTM"
Previous message: Manikanta Mylavarapu: "[PATCH 0/2] Add SCM support for IPQ5424"
In reply to: Lai, Yi: "Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work."
Next in thread: Frederic Weisbecker: "Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 11/11, Sebastian Andrzej Siewior wrote:
>
> On 2024-11-08 23:26:36 [+0100], Frederic Weisbecker wrote:
> > > Please see
> > > https://lore.kernel.org/all/1440816150.8932.123.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> > > and the whole thread.
>
> Thank you for this Oleg.
>
> > > I don't think raw_spin_lock_irq + cmpxchg for each work is a good
> > > idea, but quite possibly I misunderstood this change.
> >
> > I did not realize there could be gazillion files released in a row. So there
> > could be noticeable performance issues I guess...
>
> I made a testcase to open 2M (2 *10^6) files and then exit. This led
> task_work_run() run 2M + 3 callbacks (+ stdin/out/err) for the task.
>
> Running 70 samples on the "orig" kernel:
> - avg callback time 1.156.470,3 us
> - 63 samples are starting with 11 (1,1 sec) avg: 1.128.046,7 us
> - 6 samples are starting with 14 (1,4 sec) avg: 1.435.294,8us
>
> Running 70 samples on the "patched" kernel:
> - avg callback time 1.278.938,8 us
> - 59 samples are starting with 12 (1,2 sec) avg: 1.230.189,1 us
> - 10 samples are starting with 15 (1,5sec) avg: 1.555.934,5 us
>
> With the extra lock the task_work_run() runtime extends by approximately
> ~122ms for the 2M invoked callbacks.
> The spike 1,1sec -> 1,4sec or 1,2sec -> 1,5 sec is due to context
> switching (there are few cond_resched()/ might_sleep()).
>
> It is not that bad, is it?

Not that bad, but I personally dislike this patch for other reasons.
But lets forget it for the moment.

The numbers in

PATCH] task_work: remove fifo ordering guarantee
https://lore.kernel.org/all/1440816150.8932.123.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/

didn't look too bad too, yet they convinced Linus and other reviewers.

I still think that fifo makes much more sense. The main (only?) offender
is fput(), so perhaps we can do something like
https://lore.kernel.org/all/20150907134924.GA24254@xxxxxxxxxx/
but when I look at this change now I see it is racy.

Stupid question. What if we revert this "task_work: remove fifo ordering guarantee"
patch above? Can this help?

I don't understand this code and the problem. But when I (try to) read the
previous discussion on lore.kernel.org it seems that perf_pending_task_sync()
fails to cancel event->pending_task because it is called from task_work_run()
and then rcuwait_wait_event() obviously hangs.

Your patch can only help if task_work_add(current, &event->pending_task) was
called before fput()->task_work_add(task, &file->f_task_work(), right?
So perhaps, if we restore the fifo ordering, we can rely on the fact that
current should call perf_pending_task() before it calls perf_release/free_event ?

Sorry in advance, I am sure I have missed something...

Oleg.

Next message: Maarten Lankhorst: "[PATCH v2 3/7] drm/ttm: Handle cgroup based eviction in TTM"
Previous message: Manikanta Mylavarapu: "[PATCH 0/2] Add SCM support for IPQ5424"
In reply to: Lai, Yi: "Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work."
Next in thread: Frederic Weisbecker: "Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]