[PATCH v2] umh: fix out of scope usage when the process is being killed

From: Schspa Shi
Date: Wed Dec 14 2022 - 08:47:22 EST


When the process is killed, wait_for_completion_state will return with
-ERESTARTSYS, and the completion variable in the stack will be unavailable,
even freed. If the user-mode thread is complete at the same time, there
will be a race to use a unavailable variable.

Please refer to the following scenarios.
T1 T2
------------------------------------------------------------------
call_usermodehelper_exec
call_usermodehelper_exec_async
<< do something >>
umh_complete(sub_info);
comp = xchg(&sub_info->complete, NULL);
/* we got the completion */
<< context switch >>

<< Being killed >>
retval = wait_for_completion_state(sub_info->complete, state);
if (!retval)
goto wait_done;

if (wait & UMH_KILLABLE) {
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
<< we can't got the completion, because T2 take it already >>
}
....
return retval;
}

/**
* the completion variable in stack is end of life cycle.
* and maybe freed due to process is recycled.
*/
-------- BUG here----------
if (comp)
complete(comp);

To fix it, we can add an additional wait_for_completion to ensure the
completion object is completely unused. And this is what
kthread_create_on_node does to handle this race.

Reported-by: syzbot+10d19d528d9755d9af22@xxxxxxxxxxxxxxxxxxxxxxxxx
Reported-by: syzbot+70d5d5d83d03db2c813d@xxxxxxxxxxxxxxxxxxxxxxxxx
Reported-by: syzbot+83cb0411d0fcf0a30fc1@xxxxxxxxxxxxxxxxxxxxxxxxx
Reported-by: syzbot+c92c6a251d49ceceb625@xxxxxxxxxxxxxxxxxxxxxxxxx
Signed-off-by: Schspa Shi <schspa@xxxxxxxxx>
---

v1->v2:
- Use a new way to fix the race as kthread_create_on_node do.
- Optimize comments and use more accurate words to describe the problem.

kernel/umh.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/umh.c b/kernel/umh.c
index 850631518665..d8350a195c7f 100644
--- a/kernel/umh.c
+++ b/kernel/umh.c
@@ -452,6 +452,10 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
/* umh_complete() will see NULL and free sub_info */
if (xchg(&sub_info->complete, NULL))
goto unlock;
+ /*
+ * umh_complete will call complete() shortly.
+ */
+ wait_for_completion(&done);
}

wait_done:
--
2.37.3