Re: [PATCH v3 2/2] slub: Introduce CONFIG_SLUB_RCU_DEBUG

From: Jann Horn
Date: Fri Jul 26 2024 - 10:13:32 EST


On Fri, Jul 26, 2024 at 2:44 AM Andrey Konovalov <andreyknvl@xxxxxxxxx> wrote:
> On Thu, Jul 25, 2024 at 5:32 PM Jann Horn <jannh@xxxxxxxxxx> wrote:
> >
> > Currently, KASAN is unable to catch use-after-free in SLAB_TYPESAFE_BY_RCU
> > slabs because use-after-free is allowed within the RCU grace period by
> > design.
> >
> > Add a SLUB debugging feature which RCU-delays every individual
> > kmem_cache_free() before either actually freeing the object or handing it
> > off to KASAN, and change KASAN to poison freed objects as normal when this
> > option is enabled.
[...]
> > diff --git a/mm/Kconfig.debug b/mm/Kconfig.debug
> > index afc72fde0f03..0c088532f5a7 100644
> > --- a/mm/Kconfig.debug
> > +++ b/mm/Kconfig.debug
> > @@ -70,6 +70,35 @@ config SLUB_DEBUG_ON
> > off in a kernel built with CONFIG_SLUB_DEBUG_ON by specifying
> > "slab_debug=-".
> >
> > +config SLUB_RCU_DEBUG
> > + bool "Make use-after-free detection possible in TYPESAFE_BY_RCU caches"
>
> Perhaps, it makes sense to point out that is related to KASAN's
> use-after-free detection in the option description.

Hmm, yeah, maybe I'll change it to
"Enable UAF detection in TYPESAFE_BY_RCU caches (for KASAN)"
and then we can change that in the future if the feature becomes
usable with other SLUB stuff.

> > + depends on SLUB_DEBUG
>
> Do we need depends on KASAN?

My original thinking was: The feature is supposed to work basically
independently of KASAN. It doesn't currently do anything useful
without KASAN, but if we do something about constructor slabs in the
future, this should make it possible to let SLUB poison freed objects.
(Though that might also require going back to deterministically
RCU-delaying the freeing of objects in the future...)

But yeah, I guess for now the config option is useless without KASAN,
so it's reasonable to make it depend on KASAN for now. I'll change it
that way.

> > + default KASAN_GENERIC || KASAN_SW_TAGS
> > + help
> > + Make SLAB_TYPESAFE_BY_RCU caches behave approximately as if the cache
> > + was not marked as SLAB_TYPESAFE_BY_RCU and every caller used
> > + kfree_rcu() instead.
> > +
> > + This is intended for use in combination with KASAN, to enable KASAN to
> > + detect use-after-free accesses in such caches.
> > + (KFENCE is able to do that independent of this flag.)
> > +
> > + This might degrade performance.
> > + Unfortunately this also prevents a very specific bug pattern from
> > + triggering (insufficient checks against an object being recycled
> > + within the RCU grace period); so this option can be turned off even on
> > + KASAN builds, in case you want to test for such a bug.
> > +
> > + If you're using this for testing bugs / fuzzing and care about
> > + catching all the bugs WAY more than performance, you might want to
> > + also turn on CONFIG_RCU_STRICT_GRACE_PERIOD.
> > +
> > + WARNING:
> > + This is designed as a debugging feature, not a security feature.
> > + Objects are sometimes recycled without RCU delay under memory pressure.
> > +
> > + If unsure, say N.
> > +
> > config PAGE_OWNER
> > bool "Track page owner"
> > depends on DEBUG_KERNEL && STACKTRACE_SUPPORT
> > diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> > index 7c7fc6ce7eb7..d92cb2e9189d 100644
> > --- a/mm/kasan/common.c
> > +++ b/mm/kasan/common.c
> > @@ -238,7 +238,8 @@ static enum free_validation_result check_slab_free(struct kmem_cache *cache,
> > }
> >
> > static inline bool poison_slab_object(struct kmem_cache *cache, void *object,
> > - unsigned long ip, bool init)
> > + unsigned long ip, bool init,
> > + bool after_rcu_delay)
> > {
> > void *tagged_object = object;
> > enum free_validation_result valid = check_slab_free(cache, object, ip);
> > @@ -251,7 +252,8 @@ static inline bool poison_slab_object(struct kmem_cache *cache, void *object,
> > object = kasan_reset_tag(object);
> >
> > /* RCU slabs could be legally used after free within the RCU period. */
> > - if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU))
> > + if (unlikely(cache->flags & SLAB_TYPESAFE_BY_RCU) &&
> > + !after_rcu_delay)
>
> This can be kept on the same line.

ack, I'll change that

[...]
> > + /* Free the object - this will internally schedule an RCU callback. */
> > + kmem_cache_free(cache, p);
> > +
> > + /* We should still be allowed to access the object at this point because
>
> Empty line after /* here and below.

ack, I'll change that