Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch

From: Mathieu Desnoyers
Date: Wed Feb 14 2018 - 13:53:06 EST

Next message: Brian Gerst: "Re: [RFC PATCH 3/4] x86/entry/64: move switch_to_thread_stack to interrupt helper function"
Previous message: Catalin Marinas: "Re: Patch "[Variant 2/Spectre-v2] arm64: Implement branch predictor hardening for Falkor" has been added to the 4.14-stable tree"
In reply to: Mark Rutland: "Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch"
Next in thread: Peter Zijlstra: "Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

----- On Feb 14, 2018, at 11:51 AM, Mark Rutland mark.rutland@xxxxxxx wrote:

> On Wed, Feb 14, 2018 at 03:07:41PM +0000, Will Deacon wrote:
>> Hi Mark,
>
> Hi Will,
>
>> Cheers for the report. These things tend to be a pain to debug, but I've had
>> a go.
>
> Thanks for taking a look!
>
>> On Wed, Feb 14, 2018 at 12:02:54PM +0000, Mark Rutland wrote:
>> The interesting thing here is on the exit path:
>>
>> > Freed by task 10882:
>> > save_stack mm/kasan/kasan.c:447 [inline]
>> > set_track mm/kasan/kasan.c:459 [inline]
>> > __kasan_slab_free+0x114/0x220 mm/kasan/kasan.c:520
>> > kasan_slab_free+0x10/0x18 mm/kasan/kasan.c:527
>> > slab_free_hook mm/slub.c:1393 [inline]
>> > slab_free_freelist_hook mm/slub.c:1414 [inline]
>> > slab_free mm/slub.c:2968 [inline]
>> > kmem_cache_free+0x88/0x270 mm/slub.c:2990
>> > __mmdrop+0x164/0x248 kernel/fork.c:604
>>
>> ^^ This should never run, because there's an mmgrab() about 8 lines above
>> the mmput() in exit_mm.
>>
>> > mmdrop+0x50/0x60 kernel/fork.c:615
>> > __mmput kernel/fork.c:981 [inline]
>> > mmput+0x270/0x338 kernel/fork.c:992
>> > exit_mm kernel/exit.c:544 [inline]
>>
>> Looking at exit_mm:
>>
>> mmgrab(mm);
>> BUG_ON(mm != current->active_mm);
>> /* more a memory barrier than a real lock */
>> task_lock(current);
>> current->mm = NULL;
>> up_read(&mm->mmap_sem);
>> enter_lazy_tlb(mm, current);
>> task_unlock(current);
>> mm_update_next_owner(mm);
>> mmput(mm);
>>
>> Then the comment already rings some alarm bells: our spin_lock (as used
>> by task_lock) has ACQUIRE semantics, so the mmgrab (which is unordered
>> due to being an atomic_inc) can be reordered with respect to the assignment
>> of NULL to current->mm.
>>
>> If the exit()ing task had recently migrated from another CPU, then that
>> CPU could concurrently run context_switch() and take this path:
>>
>> if (!prev->mm) {
>> prev->active_mm = NULL;
>> rq->prev_mm = oldmm;
>> }
>
> IIUC, on the prior context_switch, next->mm == NULL, so we set
> next->active_mm to prev->mm.
>
> Then, in this context_switch we set oldmm = prev->active_mm (where prev
> is next from the prior context switch).
>
> ... right?
>
>> which then means finish_task_switch will call mmdrop():
>>
>> struct mm_struct *mm = rq->prev_mm;
>> [...]
>> if (mm) {
>> membarrier_mm_sync_core_before_usermode(mm);
>> mmdrop(mm);
>> }
>
> ... then here we use what was prev->active_mm in the most recent context
> switch.
>
> So AFAICT, we're never concurrently accessing a task_struct::mm field
> here, only prev::{mm,active_mm} while prev is current...
>
> [...]
>
>> diff --git a/kernel/exit.c b/kernel/exit.c
>> index 995453d9fb55..f91e8d56b03f 100644
>> --- a/kernel/exit.c
>> +++ b/kernel/exit.c
>> @@ -534,8 +534,9 @@ static void exit_mm(void)
>> }
>> mmgrab(mm);
>> BUG_ON(mm != current->active_mm);
>> - /* more a memory barrier than a real lock */
>> task_lock(current);
>> + /* Ensure we've grabbed the mm before setting current->mm to NULL */
>> + smp_mb__after_spin_lock();
>> current->mm = NULL;
>
> ... and thus I don't follow why we would need to order these with
> anything more than a compiler barrier (if we're preemptible here).
>
> What have I completely misunderstood? ;)

The compiler barrier would not change anything, because task_lock()
already implies a compiler barrier (provided by the arch spin lock
inline asm memory clobber). So compiler-wise, it cannot move the
mmgrab(mm) after the store "current->mm = NULL".

However, given the scenario involves multiples CPUs (one doing exit_mm(),
the other doing context switch), the actual order of perceived load/store
can be shuffled. And AFAIU nothing prevents the CPU from ordering the
atomic_inc() done by mmgrab(mm) _after_ the store to current->mm.

I wonder if we should not simply add a smp_mb__after_atomic() into
mmgrab() instead ? I see that e.g. futex.c does:

static inline void futex_get_mm(union futex_key *key)
{
mmgrab(key->private.mm);
/*
* Ensure futex_get_mm() implies a full barrier such that
* get_futex_key() implies a full barrier. This is relied upon
* as smp_mb(); (B), see the ordering comment above.
*/
smp_mb__after_atomic();
}

It could prevent nasty subtle bugs in other mmgrab() users.

Thoughts ?

Thanks,

Mathieu

>
> Thanks,
> Mark.

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Next message: Brian Gerst: "Re: [RFC PATCH 3/4] x86/entry/64: move switch_to_thread_stack to interrupt helper function"
Previous message: Catalin Marinas: "Re: Patch "[Variant 2/Spectre-v2] arm64: Implement branch predictor hardening for Falkor" has been added to the 4.14-stable tree"
In reply to: Mark Rutland: "Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch"
Next in thread: Peter Zijlstra: "Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]