Re: [PATCH v4 2/6] perf: Enqueue SIGTRAP always via task_work.

From: Oleg Nesterov
Date: Thu Dec 05 2024 - 04:21:09 EST


On 12/05, Frederic Weisbecker wrote:
>
> Le Wed, Dec 04, 2024 at 02:48:27PM +0100, Oleg Nesterov a écrit :
>
> > So perhaps, if we restore the fifo ordering, we can rely on the fact that
> > current should call perf_pending_task() before it calls perf_release/free_event ?
>
> Hmm but a perf event can still fire between the task_work_add() on fput and the
> actual call to task_work_run() that will run the queue. So &event->pending_task
> can be set either before or after. And then whether fifo or lifo, that would
> still be buggy.

Ah, indeed,

> Or am I missing something?

No, it is me, thanks for correcting me!

> Looking at task_work, it seems that most enqueues happen to the current task.
> AFAICT, only io_uring() does remote enqueue. Would it make sense to have a light
> version of task_work that is only ever used by current? This would be a very
> simple flavour with easy queue and cancellation without locking/atomics/RmW
> operations.

Perhaps, but we also need to avoid the races with task_work_cancel() from
another task. I mean, if a task T does task_work_add_light(work), it can race
with task_work_cancel(T, ...) which can change T->task_works on another CPU.


OK, I can't suggest a good solution for this problem, so I won't argue with
the patch from Sebastian. Unfortunately, with this change we can never restore
the fifo ordering. And note that the current ordering is not lifo, it is
"unpredictable". Plus the extra lock/xchg for each work...

Hmm. I just noticed that task_work_run() needs a simple fix:

--- x/kernel/task_work.c
+++ x/kernel/task_work.c
@@ -235,7 +235,7 @@
raw_spin_unlock_irq(&task->pi_lock);

do {
- next = work->next;
+ next = READ_ONCE(work->next);
work->func(work);
work = next;
cond_resched();

Perhaps it makes sense before the patch from Sebastian even if that patch
removes this do/while loop ?

Oleg.