Re: [PATCH] mm/slab_common: fix possiable double free of kmem_cache

From: Vlastimil Babka
Date: Mon Sep 19 2022 - 08:03:24 EST


On 9/19/22 13:56, Hyeonggon Yoo wrote:
> On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
>> On 9/19/22 05:12, Feng Tang wrote:
>> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
>> > kunit test case cause a use-after-free error:
>> >
>
> If I'm not mistaken, I think the subject should be:
> s/double free/use after free/g

Well, it's both AFAICS. By the initial use-after-free we can read a wrong
s->flags that was modified since we freed for the first time, and it can
lead to another kmem_cache_release() which is basically a double free.

>> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30
>> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
>> >
>> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
>> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
>> > Call Trace:
>> > <TASK>
>> > dump_stack_lvl+0x34/0x48
>> > print_address_description.constprop.0+0x87/0x2a5
>> > print_report+0x103/0x1ed
>> > kasan_report+0xb7/0x140
>> > kobject_del+0x14/0x30
>> > kmem_cache_destroy+0x130/0x170
>> > test_exit+0x1a/0x30
>> > kunit_try_run_case+0xad/0xc0
>> > kunit_generic_run_threadfn_adapter+0x26/0x50
>> > kthread+0x17b/0x1b0
>> > </TASK>
>> >
>> > The cause is inside kmem_cache_destroy():
>> >
>> > kmem_cache_destroy
>> > acquire lock/mutex
>> > shutdown_cache
>> > schedule_work(kmem_cache_release) (if RCU flag set)
>> > release lock/mutex
>> > kmem_cache_release (if RCU flag set)
>>
>> ^ not set
>>
>> I've fixed that up.
>>
>> >
>> > in some certain timing, the scheduled work could be run before
>> > the next RCU flag checking which will get a wrong state.
>> >
>> > Fix it by caching the RCU flag inside protected area, just like 'refcnt'
>
> Very nice catch, thanks!
>
> Otherwise (and with Vlastimil's fix):
>
> Looks good to me.
> Reviewed-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
>
>> >
>> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
>>
>> Thanks!
>>
>> > ---
>> >
>> > note:
>> >
>> > The error only happens on linux-next tree, and not in Linus' tree,
>> > which already has Waiman's commit:
>> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
>> > without holding slab_mutex/cpu_hotplug_lock")
>>
>> Actually that commit is already in Linus' rc5 too, so I will send your fix
>> this week too. Added a Fixes: 0495e337b703 (...) too.
>>
>> > mm/slab_common.c | 5 ++++-
>> > 1 file changed, 4 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/mm/slab_common.c b/mm/slab_common.c
>> > index 07b948288f84..ccc02573588f 100644
>> > --- a/mm/slab_common.c
>> > +++ b/mm/slab_common.c
>> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
>> > void kmem_cache_destroy(struct kmem_cache *s)
>> > {
>> > int refcnt;
>> > + bool rcu_set;
>> >
>> > if (unlikely(!s) || !kasan_check_byte(s))
>> > return;
>> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
>> > cpus_read_lock();
>> > mutex_lock(&slab_mutex);
>> >
>> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
>> > +
>> > refcnt = --s->refcount;
>> > if (refcnt)
>> > goto out_unlock;
>> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
>> > out_unlock:
>> > mutex_unlock(&slab_mutex);
>> > cpus_read_unlock();
>> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
>> > + if (!refcnt && !rcu_set)
>> > kmem_cache_release(s);
>> > }
>> > EXPORT_SYMBOL(kmem_cache_destroy);
>>
>