Re: [PATCH 1/1] slab: ensure slab->obj_exts is clear in a newly allocated slab page

From: Suren Baghdasaryan
Date: Fri Apr 11 2025 - 13:45:35 EST


On Fri, Apr 11, 2025 at 9:59 AM Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
>
> On Fri, Apr 11, 2025 at 9:27 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >
> > On 4/11/25 17:57, Suren Baghdasaryan wrote:
> > > ktest recently reported crashes while running several buffered io tests
> > > with __alloc_tagging_slab_alloc_hook() at the top of the crash call stack.
> > > The signature indicates an invalid address dereference with low bits of
> > > slab->obj_exts being set. The bits were outside of the range used by
> > > page_memcg_data_flags and objext_flags and hence were not masked out
> > > by slab_obj_exts() when obtaining the pointer stored in slab->obj_exts.
> > > The typical crash log looks like this:
> > >
> > > 00510 Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010
> > > 00510 Mem abort info:
> > > 00510 ESR = 0x0000000096000045
> > > 00510 EC = 0x25: DABT (current EL), IL = 32 bits
> > > 00510 SET = 0, FnV = 0
> > > 00510 EA = 0, S1PTW = 0
> > > 00510 FSC = 0x05: level 1 translation fault
> > > 00510 Data abort info:
> > > 00510 ISV = 0, ISS = 0x00000045, ISS2 = 0x00000000
> > > 00510 CM = 0, WnR = 1, TnD = 0, TagAccess = 0
> > > 00510 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> > > 00510 user pgtable: 4k pages, 39-bit VAs, pgdp=0000000104175000
> > > 00510 [0000000000000010] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000
> > > 00510 Internal error: Oops: 0000000096000045 [#1] SMP
> > > 00510 Modules linked in:
> > > 00510 CPU: 10 UID: 0 PID: 7692 Comm: cat Not tainted 6.15.0-rc1-ktest-g189e17946605 #19327 NONE
> > > 00510 Hardware name: linux,dummy-virt (DT)
> > > 00510 pstate: 20001005 (nzCv daif -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
> > > 00510 pc : __alloc_tagging_slab_alloc_hook+0xe0/0x190
> > > 00510 lr : __kmalloc_noprof+0x150/0x310
> > > 00510 sp : ffffff80c87df6c0
> > > 00510 x29: ffffff80c87df6c0 x28: 000000000013d1ff x27: 000000000013d200
> > > 00510 x26: ffffff80c87df9e0 x25: 0000000000000000 x24: 0000000000000001
> > > 00510 x23: ffffffc08041953c x22: 000000000000004c x21: ffffff80c0002180
> > > 00510 x20: fffffffec3120840 x19: ffffff80c4821000 x18: 0000000000000000
> > > 00510 x17: fffffffec3d02f00 x16: fffffffec3d02e00 x15: fffffffec3d00700
> > > 00510 x14: fffffffec3d00600 x13: 0000000000000200 x12: 0000000000000006
> > > 00510 x11: ffffffc080bb86c0 x10: 0000000000000000 x9 : ffffffc080201e58
> > > 00510 x8 : ffffff80c4821060 x7 : 0000000000000000 x6 : 0000000055555556
> > > 00510 x5 : 0000000000000001 x4 : 0000000000000010 x3 : 0000000000000060
> > > 00510 x2 : 0000000000000000 x1 : ffffffc080f50cf8 x0 : ffffff80d801d000
> > > 00510 Call trace:
> > > 00510 __alloc_tagging_slab_alloc_hook+0xe0/0x190 (P)
> > > 00510 __kmalloc_noprof+0x150/0x310
> > > 00510 __bch2_folio_create+0x5c/0xf8
> > > 00510 bch2_folio_create+0x2c/0x40
> > > 00510 bch2_readahead+0xc0/0x460
> > > 00510 read_pages+0x7c/0x230
> > > 00510 page_cache_ra_order+0x244/0x3a8
> > > 00510 page_cache_async_ra+0x124/0x170
> > > 00510 filemap_readahead.isra.0+0x58/0xa0
> > > 00510 filemap_get_pages+0x454/0x7b0
> > > 00510 filemap_read+0xdc/0x418
> > > 00510 bch2_read_iter+0x100/0x1b0
> > > 00510 vfs_read+0x214/0x300
> > > 00510 ksys_read+0x6c/0x108
> > > 00510 __arm64_sys_read+0x20/0x30
> > > 00510 invoke_syscall.constprop.0+0x54/0xe8
> > > 00510 do_el0_svc+0x44/0xc8
> > > 00510 el0_svc+0x18/0x58
> > > 00510 el0t_64_sync_handler+0x104/0x130
> > > 00510 el0t_64_sync+0x154/0x158
> > > 00510 Code: d5384100 f9401c01 b9401aa3 b40002e1 (f8227881)
> > > 00510 ---[ end trace 0000000000000000 ]---
> > > 00510 Kernel panic - not syncing: Oops: Fatal exception
> > > 00510 SMP: stopping secondary CPUs
> > > 00510 Kernel Offset: disabled
> > > 00510 CPU features: 0x0000,000000e0,00000410,8240500b
> > > 00510 Memory Limit: none
> > >
> > > Investigation indicates that these bits are already set when we allocate
> > > slab page and are not zeroed out after allocation. We are not yet sure
> > > why these crashes start happening only recently but regardless of the
> > > reason, not initializing a field that gets used later is wrong. Fix it
> > > by initializing slab->obj_exts during slab page allocation.
> >
> > slab->obj_exts overlays page->memcg_data and the checks on page alloc and
> > page free should catch any non-zero values, i.e. page_expected_state()
> > page_bad_reason() so if anyone is e.g. UAF-writing there or leaving garbage
> > there while freeing the page it's a bug.
> >
> > Perhaps CONFIG_MEMCG is disabled in the ktests and thus the checks are not
> > happening? We could extend them for CONFIG_SLAB_OBJ_EXT checking
> > _unused_slab_obj_exts perhaps. But it would be a short lived change, see below.
>
> Correct, CONFIG_MEMCG was disabled during these tests. We added
> BUG_ON() in the slab allocation path to trigger on these low bits and
> it did trigger but the same assertion in the freeing path did not
> catch anything. We suspected 4996fc547f5b ("mm: let _folio_nr_pages
> overlay memcg_data in first tail page") to cause this but Kent's
> bisection did not confirm that.
>
> >
> > > Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions")
> > > Reported-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > > Tested-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > > Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
> > > Acked-by: Kent Overstreet <kent.overstreet@xxxxxxxxx>
> > > Cc: <stable@xxxxxxxxxxxxxxx>
> >
> > We'll need this anyway for the not so far future when struct slab is
> > separated from struct page so it's fine but it would still be great to find
> > the underlying buggy code which this is going to hide.
>
> Yeah, we will try to find the culprit. For now to prevent others from
> stepping on this mine I would like to get this in. Thanks!

Kent asked me to forward this (his email is misbehaving for some reason):

Yes, ktest doesn't flip on CONFIG_MEMCG.

Those checks you're talking about are also behind CONFIG_DEBUG_VM,
which isn't normally on. I did do some runs with it on and it didn't
fire - only additional asserts Suren and I added - so something's
missing.

In the meantime, this needs to go in quickly as a hotfix because it's
a 6.15-rc1 regression, and I've been getting distros to enable memory
allocation profiling and I'd be shocked if it doesn't cause memcg
crashes as well.




>
> >
> > > ---
> > > mm/slub.c | 10 ++++++++++
> > > 1 file changed, 10 insertions(+)
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index b46f87662e71..dc9e729e1d26 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -1973,6 +1973,11 @@ static inline void handle_failed_objexts_alloc(unsigned long obj_exts,
> > > #define OBJCGS_CLEAR_MASK (__GFP_DMA | __GFP_RECLAIMABLE | \
> > > __GFP_ACCOUNT | __GFP_NOFAIL)
> > >
> > > +static inline void init_slab_obj_exts(struct slab *slab)
> > > +{
> > > + slab->obj_exts = 0;
> > > +}
> > > +
> > > int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > gfp_t gfp, bool new_slab)
> > > {
> > > @@ -2058,6 +2063,10 @@ static inline bool need_slab_obj_ext(void)
> > >
> > > #else /* CONFIG_SLAB_OBJ_EXT */
> > >
> > > +static inline void init_slab_obj_exts(struct slab *slab)
> > > +{
> > > +}
> > > +
> > > static int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
> > > gfp_t gfp, bool new_slab)
> > > {
> > > @@ -2637,6 +2646,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node)
> > > slab->objects = oo_objects(oo);
> > > slab->inuse = 0;
> > > slab->frozen = 0;
> > > + init_slab_obj_exts(slab);
> > >
> > > account_slab(slab, oo_order(oo), s, flags);
> > >
> > >
> > > base-commit: c496b37f9061db039b413c03ccd33506175fe6ec
> >