Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache

From: Suren Baghdasaryan

Date: Mon Jun 29 2026 - 00:29:08 EST


On Sun, Jun 28, 2026 at 8:57 PM Harry Yoo <harry@xxxxxxxxxx> wrote:
>
>
> [ Adding Kees Cook for SLAB_BUCKETS conversation ]
>
> The thread:
> https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@xxxxxxxxx/
>
> On 6/29/26 8:37 AM, Suren Baghdasaryan wrote:
> > On Sun, Jun 28, 2026 at 2:22 AM Harry Yoo <harry@xxxxxxxxxx> wrote:
> >> On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
> >>> On 6/28/26 5:23 AM, Shakeel Butt wrote:
> >>>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
> >>>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
> >>>>> [...]
> >>>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
> >>>>>>>>> from a cache strictly larger than the one it describes whenever it would
> >>>>>>>>> otherwise come from the same or a smaller cache. Every reference edge
> >>>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
> >>>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
> >>>>>>>>
> >>>>>>>> This will fix the problem.
> >>>>>>>>
> >>>>>>>> But this will waste memory as we need smaller obj_exts array
> >>>>>>>> as the size gets larger.
> >>>>>>>>
> >>>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
> >>>>>>>> (needed only when memory profiling is enabled, though)
> >>>>>>>>
> >>>>>>>> That would also prevent recursion even further.
> >>>>>>>
> >>>>>>> Yes but I assume that would add kmem caches even for users not using memory
> >>>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
> >>>>>>> correctly that you don't have any concerns with this approach?
> >>>>>>
> >>>>>> Umm, the memory waste is a concern?
> >>>>>>
> >>>>>> Minimally I'd now want to only do that size bumping when allocation
> >>>>>> profiling is enabled. Ideally that means both configured in and not booted
> >>>>>> with "never".
> >>>>>>
> >>>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
> >>>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
> >>>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
> >>>>>> bucket, it can keep what it was doing and only memalloc profiling would do
> >>>>>> the strictly larger thing.
> >>>>>
> >>>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
> >>>>> charged objects.
> >>>>
> >>>> I am wrong here as I went back and see d8df600b67d7.
> >>
> >> I was confused too :)
> >>
> >>> (8dafa9f5900c upstream)
> >>>
> >>>>>
> >>>>> So here we can do dedicated caches as Harry suggested or make this size bumping
> >>>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
> >>>
> >>> Maybe long term we make kmem_buckets unconditional and use that.
> >>>
> >>>>> we do want this fix to be backported easily to older stable kernels. I will see
> >>>>> how does this narrowed down size bumping looks like.
> >>>>>
> >>>>
> >>>> BTW I think we need something like the following, right?
> >>>>
> >>>> if (mem_alloc_profiling_enabled()) {
> >>>> if (obj_exts_cache->object_size <= s->object_size)
> >>>> return s->object_size + 1;
> >>>> } else {
> >>>> if (obj_exts_cache->object_size == s->object_size)
> >>>> return s->object_size + 1;
> >>>> }
> >>
> >> We should not add mem_alloc_profiling_enabled() check because,
> >> then we're not fixing this issue on SLUB_TINY, when the caller specifies
> >> __GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.
> >>
> >> `if (!is_kmalloc_normal(s))` check already bails out when it doesn't
> >> need to bump the size.
> >>
> >> So Shakeel's original code will work fine.
> >>
> >> We're only pessimizing memory allocation profiling and
> >> SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
> >> it wouldn't make much sense to enable MEMCG on memory restricted systems
> >> anyway. (IIRC even raspberry pis don't enable the memory controller by
> >> default...)
> >>
> >> I think it's okay to fix the bug first, but we need to address
> >> the memory wastage issue sooner or later if companies (Meta and
> >> Google I guess?) are deploying kernels with memory allocation profiling
> >> on in production systems.
> >
> > Sorry for the delay folks. I just got a chance to read through this thread.
>
> Hi Suren, no worries!
>
> > I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> > this recursion problem once and for all. This size bumping and the
> > special case of SLUB_TINY are quite confusing.
>
> As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
> infrastructure would be more straightforward than new KMALLOC_TYPE
> because (I think) the kmalloc type is decided purely based on GFP
> flags and we need to somehow work around that. SLAB_BUCKETS provides
> a nice abstraction to do this.
>
> Luckily, SLAB_BUCKETS is introduced in v6.11.
> Unfortunately, SLAB_BUCKETS is optional.
>
> > We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> > enabled to avoid new caches when not needed. Does not seem too complex
> > but maybe I'm missing something? WDYT?
>
> I think we need some enhancements to achieve that with SLAB_BUCKETS
>
> 1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
> (w/ SLAB_BUCKETS being a transitional config for _HARDENING)
>
> 2. Make the SLAB_BUCKETS infrastructure unconditional,
> but the decision is made at runtime:
>
> 1) actually creating a kmem_buckets vs.
> 2) falling back to kmalloc.
>
> 3. kmem_buckets_create() creates kmem_buckets only when
> SLAB_BUCKETS_HARDENING is enabled.
>
> 4. SLUB decides (not) to create kmem_buckets for internal use
> during the boot process. Use the kmem_buckets for obj_exts
> array allocation.
>
> Side note: this would unconditionally add the kmem_buckets parameter to
> the kmalloc slowpath. Probably it'd be worth introducing a dedicated
> entrypoint for kmem_buckets instead.

Yeah, this sounds quite complex. Maybe we could use the new
kmalloc_flags() introduced by Vlastimil in [1] to avoid using GFP
flags to indicate that we want to use this new KMALLOC_TYPE? That
seems simpler, though it's not backportable because kmalloc_flags() is
brand new.

[1] https://lore.kernel.org/all/20260615-slab_alloc_flags-v3-0-ce1146d140fb@xxxxxxxxxx/

>
> > If it is more complex than I imaging then I'm fine with Shakeel's
> > approach as a temporary fix.
>
> Since above requires quite some changes, I'd say let's proeed with
> the fix (since it's one line of code change that fixes a bug),
> and then see how we can make SLAB_BUCKETS changes as minimal
> as possible for backporting?

I was thinking Shakeel's approach for backports and
kmalloc_flags()+KMALLOC_TYPE going forward. Just throwing this as an
option. I haven't looked closely into SLAB_BUCKETS yet, so that might
be indeed a better direction.

>
> --
> Cheers,
> Harry / Hyeonggon