Re: [PATCH] mm/slab_common: fix possiable double free of kmem_cache

From: Hyeonggon Yoo
Date: Mon Sep 19 2022 - 08:07:53 EST


On Mon, Sep 19, 2022 at 02:03:15PM +0200, Vlastimil Babka wrote:
> On 9/19/22 13:56, Hyeonggon Yoo wrote:
> > On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote:
> >> On 9/19/22 05:12, Feng Tang wrote:
> >> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu'
> >> > kunit test case cause a use-after-free error:
> >> >
> >
> > If I'm not mistaken, I think the subject should be:
> > s/double free/use after free/g
>
> Well, it's both AFAICS. By the initial use-after-free we can read a wrong
> s->flags that was modified since we freed for the first time, and it can
> lead to another kmem_cache_release() which is basically a double free.
>

Yeah, I realized that just after sending the mail ;)
it is use-after-free bug that can potentially lead to double free.

Thank you for correction!

> >> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30
> >> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261
> >> >
> >> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17
> >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
> >> > Call Trace:
> >> > <TASK>
> >> > dump_stack_lvl+0x34/0x48
> >> > print_address_description.constprop.0+0x87/0x2a5
> >> > print_report+0x103/0x1ed
> >> > kasan_report+0xb7/0x140
> >> > kobject_del+0x14/0x30
> >> > kmem_cache_destroy+0x130/0x170
> >> > test_exit+0x1a/0x30
> >> > kunit_try_run_case+0xad/0xc0
> >> > kunit_generic_run_threadfn_adapter+0x26/0x50
> >> > kthread+0x17b/0x1b0
> >> > </TASK>
> >> >
> >> > The cause is inside kmem_cache_destroy():
> >> >
> >> > kmem_cache_destroy
> >> > acquire lock/mutex
> >> > shutdown_cache
> >> > schedule_work(kmem_cache_release) (if RCU flag set)
> >> > release lock/mutex
> >> > kmem_cache_release (if RCU flag set)
> >>
> >> ^ not set
> >>
> >> I've fixed that up.
> >>
> >> >
> >> > in some certain timing, the scheduled work could be run before
> >> > the next RCU flag checking which will get a wrong state.
> >> >
> >> > Fix it by caching the RCU flag inside protected area, just like 'refcnt'
> >
> > Very nice catch, thanks!
> >
> > Otherwise (and with Vlastimil's fix):
> >
> > Looks good to me.
> > Reviewed-by: Hyeonggon Yoo <42.hyeyoo@xxxxxxxxx>
> >
> >> >
> >> > Signed-off-by: Feng Tang <feng.tang@xxxxxxxxx>
> >>
> >> Thanks!
> >>
> >> > ---
> >> >
> >> > note:
> >> >
> >> > The error only happens on linux-next tree, and not in Linus' tree,
> >> > which already has Waiman's commit:
> >> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy()
> >> > without holding slab_mutex/cpu_hotplug_lock")
> >>
> >> Actually that commit is already in Linus' rc5 too, so I will send your fix
> >> this week too. Added a Fixes: 0495e337b703 (...) too.
> >>
> >> > mm/slab_common.c | 5 ++++-
> >> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> >> > index 07b948288f84..ccc02573588f 100644
> >> > --- a/mm/slab_common.c
> >> > +++ b/mm/slab_common.c
> >> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s)
> >> > void kmem_cache_destroy(struct kmem_cache *s)
> >> > {
> >> > int refcnt;
> >> > + bool rcu_set;
> >> >
> >> > if (unlikely(!s) || !kasan_check_byte(s))
> >> > return;
> >> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >> > cpus_read_lock();
> >> > mutex_lock(&slab_mutex);
> >> >
> >> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU;
> >> > +
> >> > refcnt = --s->refcount;
> >> > if (refcnt)
> >> > goto out_unlock;
> >> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s)
> >> > out_unlock:
> >> > mutex_unlock(&slab_mutex);
> >> > cpus_read_unlock();
> >> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU))
> >> > + if (!refcnt && !rcu_set)
> >> > kmem_cache_release(s);
> >> > }
> >> > EXPORT_SYMBOL(kmem_cache_destroy);
> >>
> >
>

--
Thanks,
Hyeonggon