Re: [PATCH] task_work: Consume only item at a time while invoking the callbacks.

From: Oleg Nesterov
Date: Sun Feb 23 2025 - 17:41:03 EST


Well... I won't really argue because I can't suggest a better fix at
least right now. Most probably never.

However, let me say that this patch doesn't make me happy ;) See below.

On 02/21, Sebastian Andrzej Siewior wrote:
>
> Oleg pointed out that this might be problematic if one closes 2.000.000
> files at once. While testing this scenario by opening that many files
> following by exit() to ensure that all files are closed at once, I did
> not observe anything outside of noise.

and this probably means that we can revert c82199061009 ("task_work: remove
fifo ordering guarantee") and restore the fifo ordering which IMO makes much
more sense.

But:

> Fixes: c5d93d23a2601 ("perf: Enqueue SIGTRAP always via task_work.")

Yes. So, to fix this specific problem in perf this patch changes task_work.c

And after this change we can never enforce a "clear" ordering, fifo or even lifo.
The ordering is simply "unpredictable/random".

I'll try to find and read the previous discussions tomorrow, but iirc Frederic
had another solution?

Oleg.

> --- a/kernel/task_work.c
> +++ b/kernel/task_work.c
> @@ -194,7 +194,7 @@ bool task_work_cancel(struct task_struct *task, struct callback_head *cb)
> void task_work_run(void)
> {
> struct task_struct *task = current;
> - struct callback_head *work, *head, *next;
> + struct callback_head *work, *head;
>
> for (;;) {
> /*
> @@ -202,17 +202,7 @@ void task_work_run(void)
> * work_exited unless the list is empty.
> */
> work = READ_ONCE(task->task_works);
> - do {
> - head = NULL;
> - if (!work) {
> - if (task->flags & PF_EXITING)
> - head = &work_exited;
> - else
> - break;
> - }
> - } while (!try_cmpxchg(&task->task_works, &work, head));
> -
> - if (!work)
> + if (!work && !(task->flags & PF_EXITING))
> break;
> /*
> * Synchronize with task_work_cancel_match(). It can not remove
> @@ -220,13 +210,24 @@ void task_work_run(void)
> * But it can remove another entry from the ->next list.
> */
> raw_spin_lock_irq(&task->pi_lock);
> + do {
> + head = NULL;
> + if (work) {
> + head = READ_ONCE(work->next);
> + } else {
> + if (task->flags & PF_EXITING)
> + head = &work_exited;
> + else
> + break;
> + }
> + } while (!try_cmpxchg(&task->task_works, &work, head));
> raw_spin_unlock_irq(&task->pi_lock);
>
> - do {
> - next = work->next;
> - work->func(work);
> - work = next;
> + if (!work)
> + break;
> + work->func(work);
> +
> + if (head)
> cond_resched();
> - } while (work);
> }
> }
> --
> 2.47.2
>