Re: [syzbot] [mm?] [f2fs?] [exfat?] memory leak in __kfree_rcu_sheaf

From: Catalin Marinas

Date: Fri Mar 06 2026 - 14:35:15 EST


On Wed, Mar 04, 2026 at 02:39:47PM +0100, Vlastimil Babka (SUSE) wrote:
> On 3/4/26 2:30 AM, Harry Yoo wrote:
> > [+Cc adding Catalin for kmemleak bits]
> > On Mon, Mar 02, 2026 at 09:39:48AM +0100, Vlastimil Babka (SUSE) wrote:
> >> On 3/2/26 04:41, Qing Wang wrote:
> >>> #syz test
> >>>
> >>> diff --git a/mm/slub.c b/mm/slub.c
> >>> index cdc1e652ec52..387979b89120 100644
> >>> --- a/mm/slub.c
> >>> +++ b/mm/slub.c
> >>> @@ -6307,15 +6307,21 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void *obj)
> >>> goto fail;
> >>>
> >>> if (!local_trylock(&s->cpu_sheaves->lock)) {
> >>> - barn_put_empty_sheaf(barn, empty);
> >>> + if (barn && data_race(barn->nr_empty) < MAX_EMPTY_SHEAVES)
> >>> + barn_put_empty_sheaf(barn, empty);
> >>> + else
> >>> + free_empty_sheaf(s, empty);
> >>> goto fail;
> >>> }
> >>>
> >>> pcs = this_cpu_ptr(s->cpu_sheaves);
> >>>
> >>> - if (unlikely(pcs->rcu_free))
> >>> - barn_put_empty_sheaf(barn, empty);
> >>> - else
> >>> + if (unlikely(pcs->rcu_free)) {
> >>> + if (barn && data_race(barn->nr_empty) < MAX_EMPTY_SHEAVES)
> >>> + barn_put_empty_sheaf(barn, empty);
> >>> + else
> >>> + free_empty_sheaf(s, empty);
> >>> + } else
> >>> pcs->rcu_free = empty;
> >>> }
> >>
> >> I don't think this would fix any leak, and syzbot agrees. It would limit the
> >> empty sheaves in barn more strictly, but they are not leaked.
> >> Hm I don't see any leak in __kfree_rcu_sheaf() or rcu_free_sheaf(). Wonder
> >> if kmemleak lacks visibility into barns or pcs's as roots for searching what
> >> objects are considered referenced, or something?
> >
> > Objects that are allocated from slab and percpu allocator should be
> > properly tracked by kmemleak. But those allocated with
> > gfpflags_allow_spinning() == false are not tracked by kmemleak.
> >
> > When barns and sheaves are allocated early (!gfpflags_allow_spinning()
> > due to gfp_allowed_mask) and it skips kmemleak_alloc_recursive(),
> > it could produce false positives because from kmemleak's point of view,
> > the objects are not reachable from the root set (data section, stack,
> > etc.).
>
> Good point.
>
> > To me it seems kmemleak should gain allow_spin == false support
> > sooner or later.
>
> Or we figure out how to deal with the false allow_spin == false during
> boot. Here I'm a bit confused how exactly it happens because AFAICS in
> slub we apply gfp_allowed_mask only when allocating a new slab, and in
> slab_post_alloc_hook() we apply it to init_mask. That is indeed passed
> to kmemleak_alloc_recursive() but not used for the
> gfpflags_allow_spinning() decision. kmemleak_alloc_recursive() should
> succeed because nobody should be holding any locks that would require
> spinning.

I don't fully understand what goes on. If kmemleak_alloc_recursive()
failed to allocate for some reason (other than SLAB_NOLEAKTRACE), it
would loudly disable kmemleak altogether and stop reporting leaks. Also
kmemleak doesn't care about allow_spin, it's only the slub code which
avoids calling kmemleak if spinning not allowed (as it takes some locks,
may call back into the slab allocator).

I wonder whether some early kmem_cache_node allocations like the ones in
early_kmem_cache_node_alloc() are not tracked and then kmemleak cannot
find n->barn. I got lost in the slub code, but something like this:

-----------8<-----------------------------------
diff --git a/mm/slub.c b/mm/slub.c
index 0c906fefc31b..401557ff5487 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7513,6 +7513,7 @@ static void early_kmem_cache_node_alloc(int node)
slab->freelist = get_freepointer(kmem_cache_node, n);
slab->inuse = 1;
kmem_cache_node->node[node] = n;
+ kmemleak_alloc(n, sizeof(*n), 1, GFP_NOWAIT);
init_kmem_cache_node(n, NULL);
inc_slabs_node(kmem_cache_node, node, slab->objects);

-------------8<----------------------------------------

Another thing I noticed, not sure it's related but we should probably
ignore an object once it has been passed to kvfree_call_rcu(), similar
to what we do on the main path in this function. Also see commit
5f98fd034ca6 ("rcu: kmemleak: Ignore kmemleak false positives when
RCU-freeing objects") when we added this kmemleak_ignore().

---------8<-----------------------------------
diff --git a/mm/slab_common.c b/mm/slab_common.c
index d5a70a831a2a..73f4668d870d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1954,8 +1954,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
if (!head)
might_sleep();

- if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kfree_rcu_sheaf(ptr))
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kfree_rcu_sheaf(ptr)) {
+ /*
+ * The object is now queued for deferred freeing via an RCU
+ * sheaf. Tell kmemleak to ignore it.
+ */
+ kmemleak_ignore(ptr);
return;
+ }

// Queue the object but don't yet schedule the batch.
if (debug_rcu_head_queue(ptr)) {
----------------8<-----------------------------------

--
Catalin