Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch

From: Peter Zijlstra
Date: Thu Feb 15 2018 - 11:48:12 EST


On Thu, Feb 15, 2018 at 02:22:39PM +0000, Will Deacon wrote:
> Instead, we've come up with a more plausible sequence that can in theory
> happen on a single CPU:
>
> <task foo calls exit()>
>
> do_exit
> exit_mm

If this is the last task of the process, we would expect:

mm_count == 1
mm_users == 1

at this point.

> mmgrab(mm); // foo's mm has count +1
> BUG_ON(mm != current->active_mm);
> task_lock(current);
> current->mm = NULL;
> task_unlock(current);

So the whole active_mm is basically the last 'real' mm, and its purpose
is to avoid switch_mm() between user tasks and kernel tasks.

A kernel task has !->mm. We do this by incrementing mm_count when
switching from user to kernel task and decrementing when switching from
kernel to user.

What exit_mm() does is change a user task into a 'kernel' task. So it
should increment mm_count to mirror the context switch. I suspect this
is what the mmgrab() in exit_mm() is for.

> <irq and ctxsw to kthread>
>
> context_switch(prev=foo, next=kthread)
> mm = next->mm;
> oldmm = prev->active_mm;
>
> if (!mm) { // True for kthread
> next->active_mm = oldmm;
> mmgrab(oldmm); // foo's mm has count +2
> }
>
> if (!prev->mm) { // True for foo
> rq->prev_mm = oldmm;
> }
>
> finish_task_switch
> mm = rq->prev_mm;
> if (mm) { // True (foo's mm)
> mmdrop(mm); // foo's mm has count +1
> }
>
> [...]
>
> <ctxsw to task bar>
>
> context_switch(prev=kthread, next=bar)
> mm = next->mm;
> oldmm = prev->active_mm; // foo's mm!
>
> if (!prev->mm) { // True for kthread
> rq->prev_mm = oldmm;
> }
>
> finish_task_switch
> mm = rq->prev_mm;
> if (mm) { // True (foo's mm)
> mmdrop(mm); // foo's mm has count +0

The context switch into the next user task will then decrement. At this
point foo no longer has a reference to its mm, except on the stack.

> }
>
> [...]
>
> <ctxsw back to task foo>
>
> context_switch(prev=bar, next=foo)
> mm = next->mm;
> oldmm = prev->active_mm;
>
> if (!mm) { // True for foo
> next->active_mm = oldmm; // This is bar's mm
> mmgrab(oldmm); // bar's mm has count +1
> }
>
>
> [return back to exit_mm]

Enter mm_users, this counts the number of tasks associated with the mm.
We start with 1 in mm_init(), and when it drops to 0, we decrement
mm_count. Since we also start with mm_count == 1, this would appear
consistent.

mmput() // --mm_users == 0, which then results in:

> mmdrop(mm); // foo's mm has count -1

In the above case, that's the very last reference to the mm, and since
we started out with mm_count == 1, this -1 makes 0 and we do the actual
free.

> At this point, we've got an imbalanced count on the mm and could free it
> prematurely as seen in the KASAN log.

I'm not sure I see premature. At this point mm_users==0, mm_count==0 and
we freed mm and there is no further use of the on-stack mm pointer and
foo no longer has a pointer to it in either ->mm or ->active_mm. It's
well and proper dead.

> A subsequent context-switch away from foo would therefore result in a
> use-after-free.

At the above point, foo no longer has a reference to mm, we cleared ->mm
early, and the context switch to bar cleared ->active_mm. The switch
back into foo then results with foo->active_mm == bar->mm, which is
fine.