[PATCH v2] mm/slub: serve slabobj_ext array from a strictly larger kmalloc cache

From: Shakeel Butt

Date: Mon Jun 29 2026 - 22:44:46 EST


A production host in the Meta fleet (6.16 kernel, memory allocation
profiling enabled) panicked with a kernel stack overflow while a kernel
driver was freeing a resource:

BUG: TASK stack guard page was hit
Oops: stack guard page
RIP: 0010:kfree+0x8/0x5d0
Call Trace:
__free_slab+0x66/0xc0
kfree+0x3f0/0x5d0
... ( ~125x __free_slab <-> kfree ) ...
<kernel driver freeing a resource>
do_syscall_64

The crash dump shows a 125-deep __free_slab<->kfree recursion that
overflowed the 16 KiB kernel stack.

What happened: a KMALLOC_NORMAL slab's obj_exts array (used by allocation
profiling / memcg accounting) is itself kmalloc()'d from a KMALLOC_NORMAL
cache, so the "slab holds another slab's obj_exts array" relation can form
cycles. With sizeof(struct slabobj_ext) == 16 and the host's geometry:

- kmalloc-512 has 64 objects/slab -> array is 64*16 == 1024 bytes,
served from kmalloc-1k;
- kmalloc-1k has 32 objects/slab -> array is 32*16 == 512 bytes,
served from kmalloc-512.

A kmalloc-512 slab and a kmalloc-1k slab therefore hold each other's
obj_exts array. Discarding one frees the other's array, which empties and
discards that slab, which frees the first's array, and so on:
__free_slab() -> free_slab_obj_exts() -> kfree() -> discard_slab() ->
__free_slab() recurses along the cycle until the stack is exhausted. The
dump confirms it: the recursion's slabs strictly alternate kmalloc-512
(obj_exts in kmalloc-1k) and kmalloc-1k (obj_exts in kmalloc-512), and
mem_alloc_profiling_key was enabled.

Commit 280ea9c3154b ("mm/slab: avoid allocating slabobj_ext array from
its own slab") is not sufficient: it bumps the allocation size only when
the array would come from the *same* cache (object_size ==). At the
geometry above neither cache is self-referential (512 != 1024 and
1024 != 512), so the bump never triggers and the kmalloc-512 <-> kmalloc-1k
cross cycle remains.

Fix it structurally by removing cycles of every shape: serve the array
from a cache strictly larger than the one it describes whenever it would
otherwise come from the same or a smaller cache. Every reference edge
then points from a smaller to a larger cache (here kmalloc-1k's array
moves to kmalloc-2k), so the relation is a DAG and cannot contain a cycle.
No slab can be self- or cross-pinned, the tear-down recursion is bounded
by the number of kmalloc size classes (it terminates at the large-kmalloc
path, which carries no obj_exts), and profiling/accounting coverage is
unchanged - the array is still allocated, only relocated.

Reproduced on next-20260623 at the same geometry: churning
kmalloc-512/kmalloc-1k under vm.mem_profiling and then shrinking leaves
kmalloc-512 with thousands of unreclaimable objects without this patch
(8056) and at baseline with it (847).

Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths")
Reported-by: Danielle Costantino <dcostantino@xxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Shakeel Butt <shakeel.butt@xxxxxxxxx>
---
Changes in v2:
- Drop the now-stale comment above the object_size comparison (Harry Yoo).
- Add a comment above the !is_kmalloc_normal() check explaining that the
size is bumped only when the object itself comes from KMALLOC_NORMAL,
i.e. via memory allocation profiling or memcg on SLUB_TINY (Harry Yoo).
- Add Cc: stable; v6.12 and v6.18 are affected (Harry Yoo).
- Restore the Reported-by tag.
No functional change from v1 (comments and tags only).

v1: https://lore.kernel.org/all/20260625230029.703750-1-shakeel.butt@xxxxxxxxx/

mm/slub.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9ec774dc7009..0c30d689820a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2124,15 +2124,14 @@ static inline void init_slab_obj_exts(struct slab *slab)
}

/*
- * Calculate the allocation size for slabobj_ext array.
+ * Size of the slabobj_ext array for @slab.
*
- * When memory allocation profiling is enabled, the obj_exts array
- * could be allocated from the same slab cache it's being allocated for.
- * This would prevent the slab from ever being freed because it would
- * always contain at least one allocated object (its own obj_exts array).
- *
- * To avoid this, increase the allocation size when we detect the array
- * may come from the same cache, forcing it to use a different cache.
+ * The array is itself kmalloc()'d. If it came from the same or a smaller
+ * kmalloc cache than @s, the "slab holds another slab's array" relation could
+ * form a cycle (self, or e.g. kmalloc-512 <-> kmalloc-1k) that pins the slabs
+ * forever and recurses via free_slab_obj_exts() -> kfree() -> discard_slab()
+ * at teardown. Force it into a strictly larger cache to keep that relation a
+ * DAG (acyclic).
*/
static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
struct slab *slab, gfp_t gfp)
@@ -2143,18 +2142,19 @@ static inline size_t obj_exts_alloc_size(struct kmem_cache *s,
if (sz > KMALLOC_MAX_CACHE_SIZE)
return sz;

+ /*
+ * Only bump the size when the object (not the obj_exts array) is
+ * allocated from KMALLOC_NORMAL, either by memory allocation profiling
+ * or memcg on SLUB_TINY with __GFP_RECLAIMABLE|__GFP_ACCOUNT.
+ * Otherwise, obj_exts allocations cannot form a cycle between
+ * kmalloc caches.
+ */
if (!is_kmalloc_normal(s))
return sz;

obj_exts_cache = kmalloc_slab(sz, NULL, gfp, __kmalloc_token(0));
- /*
- * We can't simply compare s with obj_exts_cache, because partitioned kmalloc
- * caches have multiple caches per size, selected by caller address or type.
- * Since caller address or type may differ between kmalloc_slab() and actual
- * allocation, bump size when sizes are equal.
- */
- if (s->object_size == obj_exts_cache->object_size)
- return obj_exts_cache->object_size + 1;
+ if (obj_exts_cache->object_size <= s->object_size)
+ return s->object_size + 1;

return sz;
}
--
2.53.0-Meta