Re: [PATCH] slab: fix memory leak when refill_sheaf() fails
From: Hao Li
Date: Wed Mar 11 2026 - 12:33:31 EST
On Wed, Mar 11, 2026 at 10:46:23PM +0800, Hao Li wrote:
> On Wed, Mar 11, 2026 at 05:36:17PM +0800, Qing Wang wrote:
> > When refill_sheaf() partially fills one sheaf (e.g., fills 5 objects
> > but need to fill 10), it will update sheaf->size and return -ENOMEM.
> > However, the callers (alloc_full_sheaf() and __pcs_replace_empty_main())
> > directly call free_empty_sheaf() on failure, which only does kfree(sheaf),
> > causing the partially allocated objects memory in sheaf->objects[] leaked.
> >
> > Fix this by calling sheaf_flush_unused() before free_empty_sheaf() to
> > free objects of sheaf->objects[]. And also add a WARN_ON() in
> > free_empty_sheaf() to catch any future cases where a non-empty sheaf is
> > being freed.
> >
> > Fixes: 2d517aa09bbc ("slab: add opt-in caching layer of percpu sheaves")
> > Signed-off-by: Qing Wang <wangqing7171@xxxxxxxxx>
> > ---
>
> This fix looks good to me.
> So feel free to add:
>
> Reviewed-by: Hao Li <hao.li@xxxxxxxxx>
>
> I also want to bring up another point here, although it may be outside the
> scope of the current fix.
>
> When I looked into the refill_sheaf() path, I found a refill failure does not
> guarantee that the sheaf remains intact: refill_sheaf() can partially fill the
> sheaf before failing. This non-intact behavior propagates to its caller,
> __prefill_sheaf_pfmemalloc(), which therefore also cannot assume that the sheaf
> is still intact after a refill failure.
>
> However, the comment for kmem_cache_refill_sheaf() says that "if the refill
> fails (returning -ENOMEM), the existing sheaf is left intact." That means the
> behavior of __prefill_sheaf_pfmemalloc() - where the sheaf may be left
> partially filled on refill failure - contradicts the API contract of
> kmem_cache_refill_sheaf().
>
> Maybe we can add rollback logic to __prefill_sheaf_pfmemalloc() so that it
> provides intact semantics, preventing the non-intact behavior of refill_sheaf()
> from propagating up to kmem_cache_refill_sheaf().
Looking at this a bit more, after checking the current callers, it seems that
the existing callers of kmem_cache_refill_sheaf() are not relying on the sheaf
remaining intact on refill failure.
If so, then another possible option might be to update the comment for
kmem_cache_refill_sheaf() to match the current behavior, rather than adding
rollback logic.
So it may just come down to whether we want to preserve the documented
semantics in the implementation, or adjust the comment to reflect what the code
already does.
I may be missing some intended dependency here, though.
--
Thanks,
Hao