Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work.
From: Frederic Weisbecker
Date: Wed Dec 04 2024 - 19:20:52 EST
Le Wed, Dec 04, 2024 at 02:48:27PM +0100, Oleg Nesterov a écrit :
> On 11/11, Sebastian Andrzej Siewior wrote:
> Not that bad, but I personally dislike this patch for other reasons.
> But lets forget it for the moment.
>
> The numbers in
>
> PATCH] task_work: remove fifo ordering guarantee
> https://lore.kernel.org/all/1440816150.8932.123.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
>
> didn't look too bad too, yet they convinced Linus and other reviewers.
>
> I still think that fifo makes much more sense. The main (only?) offender
> is fput(), so perhaps we can do something like
> https://lore.kernel.org/all/20150907134924.GA24254@xxxxxxxxxx/
> but when I look at this change now I see it is racy.
>
> Stupid question. What if we revert this "task_work: remove fifo ordering guarantee"
> patch above? Can this help?
>
> I don't understand this code and the problem. But when I (try to) read the
> previous discussion on lore.kernel.org it seems that perf_pending_task_sync()
> fails to cancel event->pending_task because it is called from task_work_run()
> and then rcuwait_wait_event() obviously hangs.
>
> Your patch can only help if task_work_add(current, &event->pending_task) was
> called before fput()->task_work_add(task, &file->f_task_work(), right?
Right, IIUC if &event->pending_task was added after then perf_pending_task()
would be called before perf_release() and we wouldn't have the problem.
> So perhaps, if we restore the fifo ordering, we can rely on the fact that
> current should call perf_pending_task() before it calls perf_release/free_event ?
Hmm but a perf event can still fire between the task_work_add() on fput and the
actual call to task_work_run() that will run the queue. So &event->pending_task
can be set either before or after. And then whether fifo or lifo, that would
still be buggy. Or am I missing something?
Looking at task_work, it seems that most enqueues happen to the current task.
AFAICT, only io_uring() does remote enqueue. Would it make sense to have a light
version of task_work that is only ever used by current? This would be a very
simple flavour with easy queue and cancellation without locking/atomics/RmW
operations. We would just need to be extra careful about NMIs. And cancellation
on the current queue would be more deterministic...
Of course we would then lose the advantage of a solution that works for both
remote and current enqueue...
Thanks.