Re: [PATCH] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache
From: Harry Yoo
Date: Sun Jun 28 2026 - 23:58:02 EST
[ Adding Kees Cook for SLAB_BUCKETS conversation ]
The thread:
https://lore.kernel.org/linux-mm/20260625230029.703750-1-shakeel.butt@xxxxxxxxx/
On 6/29/26 8:37 AM, Suren Baghdasaryan wrote:
> On Sun, Jun 28, 2026 at 2:22 AM Harry Yoo <harry@xxxxxxxxxx> wrote:
>> On 6/28/26 4:47 PM, Vlastimil Babka (SUSE) wrote:
>>> On 6/28/26 5:23 AM, Shakeel Butt wrote:
>>>> On Sat, Jun 27, 2026 at 07:58:12PM -0700, Shakeel Butt wrote:
>>>>> On Fri, Jun 26, 2026 at 07:11:33PM +0200, Vlastimil Babka (SUSE) wrote:
>>>>> [...]
>>>>>>>>> Fix it structurally by removing cycles of every shape: serve the array
>>>>>>>>> from a cache strictly larger than the one it describes whenever it would
>>>>>>>>> otherwise come from the same or a smaller cache. Every reference edge
>>>>>>>>> then points from a smaller to a larger cache (here kmalloc-1k's array
>>>>>>>>> moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
>>>>>>>>
>>>>>>>> This will fix the problem.
>>>>>>>>
>>>>>>>> But this will waste memory as we need smaller obj_exts array
>>>>>>>> as the size gets larger.
>>>>>>>>
>>>>>>>> We should probably create a new kmalloc type to avoid cycles instead?
>>>>>>>> (needed only when memory profiling is enabled, though)
>>>>>>>>
>>>>>>>> That would also prevent recursion even further.
>>>>>>>
>>>>>>> Yes but I assume that would add kmem caches even for users not using memory
>>>>>>> profiling. Anyways, I think that is a separate discussion. Am I understanding
>>>>>>> correctly that you don't have any concerns with this approach?
>>>>>>
>>>>>> Umm, the memory waste is a concern?
>>>>>>
>>>>>> Minimally I'd now want to only do that size bumping when allocation
>>>>>> profiling is enabled. Ideally that means both configured in and not booted
>>>>>> with "never".
>>>>>>
>>>>>> We probably should have done that already in 280ea9c3154b2. Because AFAIU
>>>>>> memcg-only obj_exts array don't have this issue (or maybe they do have the
>>>>>> [1] issue? Harry?). But if memcg-only should keep avoiding the same size
>>>>>> bucket, it can keep what it was doing and only memalloc profiling would do
>>>>>> the strictly larger thing.
>>>>>
>>>>> memcg should not have this issue as normal kmalloc caches do not serve memcg
>>>>> charged objects.
>>>>
>>>> I am wrong here as I went back and see d8df600b67d7.
>>
>> I was confused too :)
>>
>>> (8dafa9f5900c upstream)
>>>
>>>>>
>>>>> So here we can do dedicated caches as Harry suggested or make this size bumping
>>>>> very specialized as Vlastimil suggested. What do we want long term? Orthogonally
>>>
>>> Maybe long term we make kmem_buckets unconditional and use that.
>>>
>>>>> we do want this fix to be backported easily to older stable kernels. I will see
>>>>> how does this narrowed down size bumping looks like.
>>>>>
>>>>
>>>> BTW I think we need something like the following, right?
>>>>
>>>> if (mem_alloc_profiling_enabled()) {
>>>> if (obj_exts_cache->object_size <= s->object_size)
>>>> return s->object_size + 1;
>>>> } else {
>>>> if (obj_exts_cache->object_size == s->object_size)
>>>> return s->object_size + 1;
>>>> }
>>
>> We should not add mem_alloc_profiling_enabled() check because,
>> then we're not fixing this issue on SLUB_TINY, when the caller specifies
>> __GFP_RECLAIMABLE|__GFP_ACCOUNT without memory allocation profiling.
>>
>> `if (!is_kmalloc_normal(s))` check already bails out when it doesn't
>> need to bump the size.
>>
>> So Shakeel's original code will work fine.
>>
>> We're only pessimizing memory allocation profiling and
>> SLUB_TINY && MEMCG users, but (as Vlastimil suggests off-list)
>> it wouldn't make much sense to enable MEMCG on memory restricted systems
>> anyway. (IIRC even raspberry pis don't enable the memory controller by
>> default...)
>>
>> I think it's okay to fix the bug first, but we need to address
>> the memory wastage issue sooner or later if companies (Meta and
>> Google I guess?) are deploying kernels with memory allocation profiling
>> on in production systems.
>
> Sorry for the delay folks. I just got a chance to read through this thread.
Hi Suren, no worries!
> I think adding a new KMALLOC_TYPE would be the cleanest way to fix
> this recursion problem once and for all. This size bumping and the
> special case of SLUB_TINY are quite confusing.
As mentioned by Vlsatimil, in the long term, using SLAB_BUCKETS
infrastructure would be more straightforward than new KMALLOC_TYPE
because (I think) the kmalloc type is decided purely based on GFP
flags and we need to somehow work around that. SLAB_BUCKETS provides
a nice abstraction to do this.
Luckily, SLAB_BUCKETS is introduced in v6.11.
Unfortunately, SLAB_BUCKETS is optional.
> We could define that> new KMALLOC_TYPE only if memory allocation profiling or SLUB_TINY are
> enabled to avoid new caches when not needed. Does not seem too complex
> but maybe I'm missing something? WDYT?
I think we need some enhancements to achieve that with SLAB_BUCKETS
1. Rename SLAB_BUCKETS to SLAB_BUCKETS_HARDENING
(w/ SLAB_BUCKETS being a transitional config for _HARDENING)
2. Make the SLAB_BUCKETS infrastructure unconditional,
but the decision is made at runtime:
1) actually creating a kmem_buckets vs.
2) falling back to kmalloc.
3. kmem_buckets_create() creates kmem_buckets only when
SLAB_BUCKETS_HARDENING is enabled.
4. SLUB decides (not) to create kmem_buckets for internal use
during the boot process. Use the kmem_buckets for obj_exts
array allocation.
Side note: this would unconditionally add the kmem_buckets parameter to
the kmalloc slowpath. Probably it'd be worth introducing a dedicated
entrypoint for kmem_buckets instead.
> If it is more complex than I imaging then I'm fine with Shakeel's
> approach as a temporary fix.
Since above requires quite some changes, I'd say let's proeed with
the fix (since it's one line of code change that fixes a bug),
and then see how we can make SLAB_BUCKETS changes as minimal
as possible for backporting?
--
Cheers,
Harry / Hyeonggon