Re: [PATCH] umh: fix UAF when the process is being killed

From: Schspa Shi
Date: Thu Dec 15 2022 - 01:31:41 EST



Luis Chamberlain <mcgrof@xxxxxxxxxx> writes:

> Peter, Ingo, Steven would like you're review.
>
> On Tue, Dec 13, 2022 at 03:03:53PM -0800, Luis Chamberlain wrote:
>> On Mon, Dec 12, 2022 at 09:38:31PM +0800, Schspa Shi wrote:
>> > I'd like to upload a V2 patch with the new solution if you prefer the
>> > following way.
>> >
>> > diff --git a/kernel/umh.c b/kernel/umh.c
>> > index 850631518665..8023f11fcfc0 100644
>> > --- a/kernel/umh.c
>> > +++ b/kernel/umh.c
>> > @@ -452,6 +452,11 @@ int call_usermodehelper_exec(struct subprocess_info *sub_info, int wait)
>> > /* umh_complete() will see NULL and free sub_info */
>> > if (xchg(&sub_info->complete, NULL))
>> > goto unlock;
>> > + /*
>> > + * kthreadd (or new kernel thread) will call complete()
>> > + * shortly.
>> > + */
>> > + wait_for_completion(&done);
>> > }
>>
>> Yes much better. Did you verify it fixes the splat found by the bots?
>
> Wait, I'm not sure yet why this would fix it... I first started thinking
> that this may be a good example of a Coccinelle SmPL rule, something like:
>
> DECLARE_COMPLETION_ONSTACK(done);
> foo *foo;
> ...
> foo->completion = &done;
> ...
> queue_work(system_unbound_wq, &foo->work);
> ....
> ret = wait_for_completion_state(&done, state);
> ...
> if (!ret)
> S
> ...
> +wait_for_completion(&done);
>
> But that is pretty complex, and while it may be useful to know how many
> patterns we have like this, it begs the question if generalizing this
> inside the callers is best for -ERESTARTSYS condition is best. What
> do folks think?
>
> The rationale here is that if you queue stuff and give access to the
> completion variable but its on-stack obviously you can end up with the
> queued stuff complete() on a on-stack variable. The issue seems to
> be that wait_for_completion_state() for -ERESTARTSYS still means
> that the already scheduled queue'd work is *about* to run and
> the process with the completion on-stack completed. So we race with
> the end of the routine and the completion on-stack.
>
> It makes me wonder if wait_for_completion() above really is doing
> something more, if it is just helping with timing and is still error
> prone.
>
> The queued work will try the the completion as follows:
>
> static void umh_complete(struct subprocess_info *sub_info)
> {
> struct completion *comp = xchg(&sub_info->complete, NULL);
> /*
> * See call_usermodehelper_exec(). If xchg() returns NULL
> * we own sub_info, the UMH_KILLABLE caller has gone away
> * or the caller used UMH_NO_WAIT.
> */
> if (comp)
> complete(comp);
> else
> call_usermodehelper_freeinfo(sub_info);
> }
>
> So the race is getting -ERESTARTSYS on the process with completion
> on-stack and the above running complete(comp). Why would sprinkling
> wait_for_completion(&done) *after* wait_for_completion_state(&done, state)
> fix this UAF?

The wait_for_completion(&done) is added when xchg(&sub_info->complete,
NULL) return NULL. When it returns NULL, it means the umh_complete was
using the completion variable at the same time and will call complete
in a very short time.

Add wait_for_completion *after* wait_for_completion_state will make the
interruptible/timeout version API not working anymore.

> }
> diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
> index d57a5c1c1cd9..aa7031faca04 100644
> --- a/kernel/sched/completion.c
> +++ b/kernel/sched/completion.c
> @@ -205,8 +205,10 @@ int __sched wait_for_completion_interruptible(struct completion *x)
> {
> long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_INTERRUPTIBLE);
>
> - if (t == -ERESTARTSYS)
> + if (t == -ERESTARTSYS) {
> + wait_for_completion(x);
> return t;
> + }
> return 0;
> }
> EXPORT_SYMBOL(wait_for_completion_interruptible);
> @@ -243,8 +245,10 @@ int __sched wait_for_completion_killable(struct completion *x)
> {
> long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, TASK_KILLABLE);
>
> - if (t == -ERESTARTSYS)
> + if (t == -ERESTARTSYS) {
> + wait_for_completion(x);
> return t;
> + }
> return 0;
> }
> EXPORT_SYMBOL(wait_for_completion_killable);
> @@ -253,8 +257,10 @@ int __sched wait_for_completion_state(struct completion *x, unsigned int state)
> {
> long t = wait_for_common(x, MAX_SCHEDULE_TIMEOUT, state);
>
> - if (t == -ERESTARTSYS)
> + if (t == -ERESTARTSYS) {
> + wait_for_completion(x);
> return t;
> + }
> return 0;
> }
> EXPORT_SYMBOL(wait_for_completion_state);

If we want to make it a generic fix, syntactic sugar can be added to
simplify usage for users.

Consider the following patch.

diff --git a/kernel/sched/completion.c b/kernel/sched/completion.c
index d57a5c1c1cd9..67b7d02c0098 100644
--- a/kernel/sched/completion.c
+++ b/kernel/sched/completion.c
@@ -341,3 +341,33 @@ bool completion_done(struct completion *x)
return true;
}
EXPORT_SYMBOL(completion_done);
+
+void complete_on_stack(struct completion **x)
+{
+ struct completion *comp = xchg(*x, NULL);
+
+ if (comp)
+ complete(comp);
+}
+EXPORT_SYMBOL(complete_on_stack);
+
+int __sched wait_for_completion_state_on_stack(struct completion **x,
+ unsigned int state)
+{
+ struct completion *comp = *x;
+ int retval;
+
+ retval = wait_for_completion_state(comp, state);
+ if (retval) {
+ if (xchg(*x, NULL))
+ return retval;
+
+ /*
+ * complete_on_stack will call complete shortly.
+ */
+ wait_for_completion(comp);
+ }
+
+ return retval;
+}
+EXPORT_SYMBOL(wait_for_completion_state_on_stack);

--
BRs
Schspa Shi