Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch

From: Will Deacon
Date: Thu Feb 15 2018 - 13:21:54 EST


On Thu, Feb 15, 2018 at 05:47:54PM +0100, Peter Zijlstra wrote:
> On Thu, Feb 15, 2018 at 02:22:39PM +0000, Will Deacon wrote:
> > Instead, we've come up with a more plausible sequence that can in theory
> > happen on a single CPU:
> >
> > <task foo calls exit()>
> >
> > do_exit
> > exit_mm
>
> If this is the last task of the process, we would expect:
>
> mm_count == 1
> mm_users == 1
>
> at this point.
>
> > mmgrab(mm); // foo's mm has count +1
> > BUG_ON(mm != current->active_mm);
> > task_lock(current);
> > current->mm = NULL;
> > task_unlock(current);
>
> So the whole active_mm is basically the last 'real' mm, and its purpose
> is to avoid switch_mm() between user tasks and kernel tasks.
>
> A kernel task has !->mm. We do this by incrementing mm_count when
> switching from user to kernel task and decrementing when switching from
> kernel to user.
>
> What exit_mm() does is change a user task into a 'kernel' task. So it
> should increment mm_count to mirror the context switch. I suspect this
> is what the mmgrab() in exit_mm() is for.
>
> > <irq and ctxsw to kthread>
> >
> > context_switch(prev=foo, next=kthread)
> > mm = next->mm;
> > oldmm = prev->active_mm;
> >
> > if (!mm) { // True for kthread
> > next->active_mm = oldmm;
> > mmgrab(oldmm); // foo's mm has count +2
> > }
> >
> > if (!prev->mm) { // True for foo
> > rq->prev_mm = oldmm;
> > }
> >
> > finish_task_switch
> > mm = rq->prev_mm;
> > if (mm) { // True (foo's mm)
> > mmdrop(mm); // foo's mm has count +1
> > }
> >
> > [...]
> >
> > <ctxsw to task bar>
> >
> > context_switch(prev=kthread, next=bar)
> > mm = next->mm;
> > oldmm = prev->active_mm; // foo's mm!
> >
> > if (!prev->mm) { // True for kthread
> > rq->prev_mm = oldmm;
> > }
> >
> > finish_task_switch
> > mm = rq->prev_mm;
> > if (mm) { // True (foo's mm)
> > mmdrop(mm); // foo's mm has count +0
>
> The context switch into the next user task will then decrement. At this
> point foo no longer has a reference to its mm, except on the stack.
>
> > }
> >
> > [...]
> >
> > <ctxsw back to task foo>
> >
> > context_switch(prev=bar, next=foo)
> > mm = next->mm;
> > oldmm = prev->active_mm;
> >
> > if (!mm) { // True for foo
> > next->active_mm = oldmm; // This is bar's mm
> > mmgrab(oldmm); // bar's mm has count +1
> > }
> >
> >
> > [return back to exit_mm]
>
> Enter mm_users, this counts the number of tasks associated with the mm.
> We start with 1 in mm_init(), and when it drops to 0, we decrement
> mm_count. Since we also start with mm_count == 1, this would appear
> consistent.
>
> mmput() // --mm_users == 0, which then results in:
>
> > mmdrop(mm); // foo's mm has count -1
>
> In the above case, that's the very last reference to the mm, and since
> we started out with mm_count == 1, this -1 makes 0 and we do the actual
> free.
>
> > At this point, we've got an imbalanced count on the mm and could free it
> > prematurely as seen in the KASAN log.
>
> I'm not sure I see premature. At this point mm_users==0, mm_count==0 and
> we freed mm and there is no further use of the on-stack mm pointer and
> foo no longer has a pointer to it in either ->mm or ->active_mm. It's
> well and proper dead.
>
> > A subsequent context-switch away from foo would therefore result in a
> > use-after-free.
>
> At the above point, foo no longer has a reference to mm, we cleared ->mm
> early, and the context switch to bar cleared ->active_mm. The switch
> back into foo then results with foo->active_mm == bar->mm, which is
> fine.

Bugger, you're right. When we switch off foo after freeing the mm, we'll
actually access it's active mm which points to bar's mm. So whilst this
can explain part of the kasan splat, it doesn't explain the actual
use-after-free.

More head-scratching required :(

Will