Re: [PATCH] task_work: return -EBUSY when adding same work

From: zhenguo yao
Date: Sun Jul 11 2021 - 23:45:15 EST


This issue happens in a stress test of memory UE injection. It has
more than once UEs reported to the OS at the same moment in the test.
So do_machine_check-->queue_task_work is called many times.
mce_kill_me work is added to list many times. When mce_kill_me is add
to the list, it becomes the list header and then another mce_kill_me
is added to the list before task_work_run is called. The list becomes
a dead loop: task->task_works = mce_kill_me, mce_kill_me->next =
mce_kill_me. When the task want to return to user mode and run
task_work_run. It becomes a dead loop and never return to user mode
and process signal SIGBUS that mce_kill_me sent to him. I fix this by
following patch
--
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 22791aa..9333696 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1299,7 +1299,9 @@ static void queue_task_work(struct mce *m, int
kill_current_task)
else
current->mce_kill_me.func = kill_me_maybe;

- task_work_add(current, &current->mce_kill_me, TWA_RESUME);
+ /* Avoid endless loops when task_work_run is running */
+ if (READ_ONCE(current->task_works) != &current->mce_kill_me)
+ task_work_add(current, &current->mce_kill_me, TWA_RESUME);
}
--
But I think it is better return an error in task_work_add when same
work is added to the list. Similar problem may happen in other scenes.
It is hard to debug when it is a seldom issue.

Jens Axboe <axboe@xxxxxxxxx> 于2021年7月12日周一 上午10:44写道:
>
> On 7/11/21 8:13 PM, zhenguo yao wrote:
> > Yes I hit this condition. The caller is queue_task_work in
> > arch/x86/kernel/cpu/mce/core.c.
> > It is really a BUG. I have submitted another patch to fix it:
> > https://lkml.org/lkml/2021/7/9/186.
>
> That patch seems broken, what happens if mce_kill_me is added already,
> but it isn't the first work item in the list?
>
> --
> Jens Axboe
>