Re: [Regression] mm:slab/sheaves: severe performance regression in cross-CPU slab allocation

From: Ming Lei

Date: Wed Mar 11 2026 - 06:51:33 EST


On Wed, Mar 11, 2026 at 6:16 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
>
> On Wed, Mar 11, 2026 at 10:10:13AM +0900, Harry Yoo wrote:
> > On Fri, Mar 06, 2026 at 06:22:37PM +0800, Ming Lei wrote:
> > > On Fri, Mar 06, 2026 at 09:47:27AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > On 3/6/26 05:55, Harry Yoo wrote:
> > > > > On Thu, Feb 26, 2026 at 07:02:11PM +0100, Vlastimil Babka (SUSE) wrote:
> > > > >> On 2/25/26 10:31, Ming Lei wrote:
> > > > >> > Hi Vlastimil,
> > > > >> >
> > > > >> > On Wed, Feb 25, 2026 at 09:45:03AM +0100, Vlastimil Babka (SUSE) wrote:
> > > > >> >> On 2/24/26 21:27, Vlastimil Babka wrote:
> > > > >> >> >
> > > > >> >> > It made sense to me not to refill sheaves when we can't reclaim, but I
> > > > >> >> > didn't anticipate this interaction with mempools. We could change them
> > > > >> >> > but there might be others using a similar pattern. Maybe it would be for
> > > > >> >> > the best to just drop that heuristic from __pcs_replace_empty_main()
> > > > >> >> > (but carefully as some deadlock avoidance depends on it, we might need
> > > > >> >> > to e.g. replace it with gfpflags_allow_spinning()). I'll send a patch
> > > > >> >> > tomorrow to test this theory, unless someone beats me to it (feel free to).
> > > > >> >> Could you try this then, please? Thanks!
> > > > >> >
> > > > >> > Thanks for working on this issue!
> > > > >> >
> > > > >> > Unfortunately the patch doesn't make a difference on IOPS in the perf test,
> > > > >> > follows the collected perf profile on linus tree(basically 7.0-rc1 with your patch):
> > > > >>
> > > > >> what about this patch in addition to the previous one? Thanks.
> > > > >>
> > > > >> ----8<----
> > > > >> From d3e8118c078996d1372a9f89285179d93971fdb2 Mon Sep 17 00:00:00 2001
> > > > >> From: "Vlastimil Babka (SUSE)" <vbabka@xxxxxxxxxx>
> > > > >> Date: Thu, 26 Feb 2026 18:59:56 +0100
> > > > >> Subject: [PATCH] mm/slab: put barn on every online node
> > > > >>
> > > > >> Including memoryless nodes.
> > > > >>
> > > > >> Signed-off-by: Vlastimil Babka (SUSE) <vbabka@xxxxxxxxxx>
> > > > >> ---
> > > > >
> > > > > Just taking a quick grasp...
> > > > >
> > > > >> @@ -6121,7 +6122,8 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> > > > >> if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), false)))
> > > > >> return;
> > > > >>
> > > > >> - if (likely(!IS_ENABLED(CONFIG_NUMA) || slab_nid(slab) == numa_mem_id())
> > > > >> + if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> > > > >> + || !node_isset(slab_nid(slab), slab_nodes))
> > > > >
> > > > > I think you intended !node_isset(numa_mem_id(), slab_nodes)?
> > > > >
> > > > > "Skip freeing to pcs if it's remote free, but memoryless nodes is
> > > > > an exception".
> > > >
> > > > Indeed, thanks! Ming, could you retry with that fixed up please?
> > >
> > > After applying the following change, IOPS is ~25M:
> > >
> > > - delta change on the two patches
> > >
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 085fe49eec68..56fe8bd956c0 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -6142,7 +6142,7 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> > > return;
> > >
> > > if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> > > - || !node_isset(slab_nid(slab), slab_nodes))
> > > + || !node_isset(numa_mem_id(), slab_nodes))
> > > && likely(!slab_test_pfmemalloc(slab))) {
> > > if (likely(free_to_pcs(s, object, true)))
> > > return;
> > >
> >
> > Hi Ming, thanks a lot for helping testing!
> >
> > The stats look quite fine to me, but we're still seeing suboptimal IOPS.
> >
> > > - slab stat on patched `815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next`
> >
> > Does that doesn't include Vlastimil's (fb1091febd66 mm/slab: allow sheaf
> > refill if blocking is not allowed)?
>
> No, because fb1091febd66 isn't included into `815c8e35511d Merge branch
> 'slab/for-7.0/sheaves'.
>
> >
> > Next time when testing it, could you please test on top of 7.0-rc3 w/
> > the memoryless node patch (w/ the delta above) applied?
>
> IOPS is same between `815c8e35511d Merge branch 'slab/for-7.0/sheaves' into slab/for-next`
> and 7.0-rc3 with the two patches.
>
> IMO, it should be more easier to compare & investigate by focusing on
> 815c8e35511d, given there is only 41 patches between v6.19-rc5 and
> commit 815c8e35511d.
>
> >
> > Also, let us check a few things...
> >
> > 1) Does bumping up sheaf capacity change the slab stats & IOPS?
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 0c906fefc31b..5207279417e2 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -7611,13 +7611,13 @@ static unsigned int calculate_sheaf_capacity(struct kmem_cache *s,
> > * should result in similar lock contention (barn or list_lock)
> > */
> > if (s->size >= PAGE_SIZE)
> > - capacity = 4;
> > + capacity = 6;
> > else if (s->size >= 1024)
> > - capacity = 12;
> > + capacity = 24;
> > else if (s->size >= 256)
> > - capacity = 26;
> > + capacity = 52;
> > else
> > - capacity = 60;
> > + capacity = 120;
> >
> > /* Increment capacity to make sheaf exactly a kmalloc size bucket */
> > size = struct_size_t(struct slab_sheaf, objects, capacity);
>
> IOPS can be increased from 24M to 29M with this patch, against 7.0-rc3 with
> Vlastimil's today patchset.

BTW, the improvement looks unstable; sometimes it reaches 28–29M, but sometimes
it doesn't, just 25–26M.

Thanks,