Re: [PATCH v2 2/2] slub: Introduce CONFIG_SLUB_RCU_DEBUG

From: Jann Horn
Date: Thu Jul 25 2024 - 10:35:34 EST


On Thu, Jul 25, 2024 at 3:28 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> On 7/24/24 6:34 PM, Jann Horn wrote:
> > Currently, KASAN is unable to catch use-after-free in SLAB_TYPESAFE_BY_RCU
> > slabs because use-after-free is allowed within the RCU grace period by
> > design.
> >
> > Add a SLUB debugging feature which RCU-delays every individual
> > kmem_cache_free() before either actually freeing the object or handing it
> > off to KASAN, and change KASAN to poison freed objects as normal when this
> > option is enabled.
> >
> > Note that this creates an aligned 16-byte area in the middle of the slab
> > metadata area, which kinda sucks but seems to be necessary in order to be
> > able to store an rcu_head in there that can be unpoisoned while the RCU
> > callback is pending.
>
> An alternative could be a head-less variant of kfree_rcu_mightsleep() that
> would fail instead of go to reclaim if it can't allocate, and upon failure
> we would fall back ot the old behavior and give up on checking that object?

Yes, true, that would be an option... behaving differently under
memory pressure seems a little weird to me, but it would probably do
the job...

I've now tried implementing it roughly as you suggested; the diffstat
for that (on top of the existing series) looks like this:

include/linux/kasan.h | 24 +++++++++---------------
mm/kasan/common.c | 23 +++++++----------------
mm/slab.h | 3 ---
mm/slub.c | 46 +++++++++++++++++++---------------------------
4 files changed, 35 insertions(+), 61 deletions(-)

Basically it gets rid of all the plumbing I added to stuff more things
into the metadata area, but it has to add a flag to kasan_slab_free()
to tell it whether the call is happening after RCU delay or not.

I'm changing slab_free_hook() to allocate an instance of the struct

struct rcu_delayed_free {
struct rcu_head head;
void *object;
};

with kmalloc(sizeof(*delayed_free), GFP_NOWAIT), and then if that
works, I use that to RCU-delay the freeing.


I think this looks a bit nicer than my original version; I'll go run
the test suite and then send it out as v3.


> But maybe it's just too complicated and we just pay the overhead. At least
> this doesn't concern kmalloc caches with their power-of-two alignment
> guarantees where extra metadata blows things up more.

If we wanted to compress the slab metadata for this down a bit, we
could probably also overlap the out-of-line freepointer with the
rcu_head, since the freepointer can't be in use while the rcu_head is
active... but I figured that since this is a debug feature mainly
intended for ASAN builds, keeping things simple is more important.

> > (metadata_access_enable/disable doesn't work here because while the RCU
> > callback is pending, it will be accessed by asynchronous RCU processing.)
> > To be able to re-poison the area after the RCU callback is done executing,
> > a new helper kasan_poison_range_as_redzone() is necessary.
> >
> > For now I've configured Kconfig.debug to default-enable this feature in the
> > KASAN GENERIC and SW_TAGS modes; I'm not enabling it by default in HW_TAGS
> > mode because I'm not sure if it might have unwanted performance degradation
> > effects there.
> >
> > Note that this is mostly useful with KASAN in the quarantine-based GENERIC
> > mode; SLAB_TYPESAFE_BY_RCU slabs are basically always also slabs with a
> > ->ctor, and KASAN's assign_tag() currently has to assign fixed tags for
> > those, reducing the effectiveness of SW_TAGS/HW_TAGS mode.
> > (A possible future extension of this work would be to also let SLUB call
> > the ->ctor() on every allocation instead of only when the slab page is
> > allocated; then tag-based modes would be able to assign new tags on every
> > reallocation.)
> >
> > Signed-off-by: Jann Horn <jannh@xxxxxxxxxx>
>
> Acked-by: Vlastimil Babka <vbabka@xxxxxxx> #slab
>
> ...
>
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -450,6 +450,18 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work)
> >
> > static int shutdown_cache(struct kmem_cache *s)
> > {
> > + if (IS_ENABLED(CONFIG_SLUB_RCU_DEBUG) &&
> > + (s->flags & SLAB_TYPESAFE_BY_RCU)) {
> > + /*
> > + * Under CONFIG_SLUB_RCU_DEBUG, when objects in a
> > + * SLAB_TYPESAFE_BY_RCU slab are freed, SLUB will internally
> > + * defer their freeing with call_rcu().
> > + * Wait for such call_rcu() invocations here before actually
> > + * destroying the cache.
> > + */
> > + rcu_barrier();
> > + }
>
> I think once we have the series [1] settled (patch 5/6 specifically), the
> delayed destruction could handle this case too?
>
> [1]
> https://lore.kernel.org/linux-mm/20240715-b4-slab-kfree_rcu-destroy-v1-0-46b2984c2205@xxxxxxx/

Ah, thanks for the pointer, I hadn't seen that one.


> > +
> > /* free asan quarantined objects */
> > kasan_cache_shutdown(s);
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 34724704c52d..999afdc1cffb 100644
>
>