Re: [PATCH 1/1] slab: ensure slab->obj_exts is clear in a newly allocated slab page

From: Suren Baghdasaryan
Date: Fri Apr 11 2025 - 13:00:04 EST


On Fri, Apr 11, 2025 at 9:27 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 4/11/25 17:57, Suren Baghdasaryan wrote:
> > ktest recently reported crashes while running several buffered io tests
> > with __alloc_tagging_slab_alloc_hook() at the top of the crash call stack.
> > The signature indicates an invalid address dereference with low bits of
> > slab->obj_exts being set. The bits were outside of the range used by
> > page_memcg_data_flags and objext_flags and hence were not masked out
> > by slab_obj_exts() when obtaining the pointer stored in slab->obj_exts.
> > The typical crash log looks like this:
> >
> > 00510 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > 00510 Mem abort info:
> > 00510 ESR = 0x0000000096000045
> > 00510 EC = 0x25: DABT (current EL), IL = 32 bits
> > 00510 SET = 0, FnV = 0
> > 00510 EA = 0, S1PTW = 0
> > 00510 FSC = 0x05: level 1 translation fault
> > 00510 Data abort info:
> > 00510 ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000
> > 00510 CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> > 00510 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > 00510 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000104175000
> > 00510 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
> > 00510 Internal error: Oops: 0000000096000045 [#1] SMP
> > 00510 Modules linked in:
> > 00510 CPU: 10 UID: 0 PID: 7692 Comm: cat Not tainted 6.15.0-rc1-ktest-g189e17946605 #19327 NONE
> > 00510 Hardware name: linux,dummy-virt (DT)
> > 00510 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> > 00510 pc : __alloc_tagging_slab_alloc_hook+0xe0/0x190
> > 00510 lr : __kmalloc_noprof+0x150/0x310
> > 00510 sp : ffffff80c87df6c0
> > 00510 x29: ffffff80c87df6c0 x28: 000000000013d1ff x27: 000000000013d200
> > 00510 x26: ffffff80c87df9e0 x25: 0000000000000000 x24: 0000000000000001
> > 00510 x23: ffffffc08041953c x22: 000000000000004c x21: ffffff80c0002180
> > 00510 x20: fffffffec3120840 x19: ffffff80c4821000 x18: 0000000000000000
> > 00510 x17: fffffffec3d02f00 x16: fffffffec3d02e00 x15: fffffffec3d00700
> > 00510 x14: fffffffec3d00600 x13: 0000000000000200 x12: 0000000000000006
> > 00510 x11: ffffffc080bb86c0 x10: 0000000000000000 x9 : ffffffc080201e58
> > 00510 x8 : ffffff80c4821060 x7 : 0000000000000000 x6 : 0000000055555556
> > 00510 x5 : 0000000000000001 x4 : 0000000000000010 x3 : 0000000000000060
> > 00510 x2 : 0000000000000000 x1 : ffffffc080f50cf8 x0 : ffffff80d801d000
> > 00510 Call trace:
> > 00510 __alloc_tagging_slab_alloc_hook+0xe0/0x190 (P)
> > 00510 __kmalloc_noprof+0x150/0x310
> > 00510 __bch2_folio_create+0x5c/0xf8
> > 00510 bch2_folio_create+0x2c/0x40
> > 00510 bch2_readahead+0xc0/0x460
> > 00510 read_pages+0x7c/0x230
> > 00510 page_cache_ra_order+0x244/0x3a8
> > 00510 page_cache_async_ra+0x124/0x170
> > 00510 filemap_readahead.isra.0+0x58/0xa0
> > 00510 filemap_get_pages+0x454/0x7b0
> > 00510 filemap_read+0xdc/0x418
> > 00510 bch2_read_iter+0x100/0x1b0
> > 00510 vfs_read+0x214/0x300
> > 00510 ksys_read+0x6c/0x108
> > 00510 __arm64_sys_read+0x20/0x30
> > 00510 invoke_syscall.constprop.0+0x54/0xe8
> > 00510 do_el0_svc+0x44/0xc8
> > 00510 el0_svc+0x18/0x58
> > 00510 el0t_64_sync_handler+0x104/0x130
> > 00510 el0t_64_sync+0x154/0x158
> > 00510 Code: d5384100 f9401c01 b9401aa3 b40002e1 (f8227881)
> > 00510 ---[ end trace 0000000000000000 ]---
> > 00510 Kernel panic - not syncing: Oops: Fatal exception
> > 00510 SMP: stopping secondary CPUs
> > 00510 Kernel Offset: disabled
> > 00510 CPU features: 0x0000,000000e0,00000410,8240500b
> > 00510 Memory Limit: none
> >
> > Investigation indicates that these bits are already set when we allocate
> > slab page and are not zeroed out after allocation. We are not yet sure
> > why these crashes start happening only recently but regardless of the
> > reason, not initializing a field that gets used later is wrong. Fix it
> > by initializing slab->obj_exts during slab page allocation.
>
> slab->obj_exts overlays page->memcg_data and the checks on page alloc and
> page free should catch any non-zero values, i.e. page_expected_state()
> page_bad_reason() so if anyone is e.g. UAF-writing there or leaving garbage
> there while freeing the page it's a bug.
>
> Perhaps CONFIG_MEMCG is disabled in the ktests and thus the checks are not
> happening? We could extend them for CONFIG_SLAB_OBJ_EXT checking
> _unused_slab_obj_exts perhaps. But it would be a short lived change, see below.

Correct, CONFIG_MEMCG was disabled during these tests. We added
BUG_ON() in the slab allocation path to trigger on these low bits and
it did trigger but the same assertion in the freeing path did not
catch anything. We suspected 4996fc547f5b ("mm: let _folio_nr_pages
overlay memcg_data in first tail page") to cause this but Kent's
bisection did not confirm that.

>
> > Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions")
> > Reported-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > Tested-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > Acked-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx>
>
> We'll need this anyway for the not so far future when struct slab is
> separated from struct page so it's fine but it would still be great to find
> the underlying buggy code which this is going to hide.

Yeah, we will try to find the culprit. For now to prevent others from
stepping on this mine I would like to get this in. Thanks!

>
> > ---
> > mm/slub.c | 10 ++++++++++
> > 1 file changed, 10 insertions(+)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b46f87662e71..dc9e729e1d26 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1973,6 +1973,11 @@ static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
> > #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \
> > __GFP_ACCOUNT | __GFP_NOFAIL)
> >
> > +static inline void init_slab_obj_exts(struct slab *slab)
> > +{
> > + slab->obj_exts = 0;
> > +}
> > +
> > int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > gfp_t gfp, bool new_slab)
> > {
> > @@ -2058,6 +2063,10 @@ static inline bool need_slab_obj_ext(void)
> >
> > #else /* CONFIG_SLAB_OBJ_EXT */
> >
> > +static inline void init_slab_obj_exts(struct slab *slab)
> > +{
> > +}
> > +
> > static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > gfp_t gfp, bool new_slab)
> > {
> > @@ -2637,6 +2646,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > slab->objects = oo_objects(oo);
> > slab->inuse = 0;
> > slab->frozen = 0;
> > + init_slab_obj_exts(slab);
> >
> > account_slab(slab, oo_order(oo), s, flags);
> >
> >
> > base-commit: c496b37f9061db039b413c03ccd33506175fe6ec
>