Re: [Regression] mm:slab/sheaves: severe performance regression in cross-CPU slab allocation

From: Hao Li

Date: Fri Mar 06 2026 - 03:32:35 EST


On Fri, Mar 06, 2026 at 01:55:49PM +0900, Harry Yoo wrote:
> On Thu, Feb 26, 2026 at 07:02:11PM +0100, Vlastimil Babka (SUSE) wrote:
> > On 2/25/26 10:31, Ming Lei wrote:
> > > Hi Vlastimil,
> > >
> > > On Wed, Feb 25, 2026 at 09:45:03AM +0100, Vlastimil Babka (SUSE) wrote:
> > >> On 2/24/26 21:27, Vlastimil Babka wrote:
> > >> >
> > >> > It made sense to me not to refill sheaves when we can't reclaim, but I
> > >> > didn't anticipate this interaction with mempools. We could change them
> > >> > but there might be others using a similar pattern. Maybe it would be for
> > >> > the best to just drop that heuristic from __pcs_replace_empty_main()
> > >> > (but carefully as some deadlock avoidance depends on it, we might need
> > >> > to e.g. replace it with gfpflags_allow_spinning()). I'll send a patch
> > >> > tomorrow to test this theory, unless someone beats me to it (feel free to).
> > >> Could you try this then, please? Thanks!
> > >
> > > Thanks for working on this issue!
> > >
> > > Unfortunately the patch doesn't make a difference on IOPS in the perf test,
> > > follows the collected perf profile on linus tree(basically 7.0-rc1 with your patch):
> >
> > what about this patch in addition to the previous one? Thanks.
> >
> > ----8<----
> > From d3e8118c078996d1372a9f89285179d93971fdb2 Mon Sep 17 00:00:00 2001
> > From: "Vlastimil Babka (SUSE)" <vbabka@xxxxxxxxxx>
> > Date: Thu, 26 Feb 2026 18:59:56 +0100
> > Subject: [PATCH] mm/slab: put barn on every online node
> >
> > Including memoryless nodes.
> >
> > Signed-off-by: Vlastimil Babka (SUSE) <vbabka@xxxxxxxxxx>
> > ---
>
> Just taking a quick grasp...
>
> > @@ -6121,7 +6122,8 @@ void slab_free(struct kmem_cache *s, struct slab *slab, void *object,
> > if (unlikely(!slab_free_hook(s, object, slab_want_init_on_free(s), false)))
> > return;
> >
> > - if (likely(!IS_ENABLED(CONFIG_NUMA) || slab_nid(slab) == numa_mem_id())
> > + if (likely(!IS_ENABLED(CONFIG_NUMA) || (slab_nid(slab) == numa_mem_id())
> > + || !node_isset(slab_nid(slab), slab_nodes))
>
> I think you intended !node_isset(numa_mem_id(), slab_nodes)?

This is a good catch! and it could explain why CPUs on memoryless nodes can have
higher barn_get_fail. They have too less sheaves in barn...

>
> "Skip freeing to pcs if it's remote free, but memoryless nodes is
> an exception".
>
> > && likely(!slab_test_pfmemalloc(slab))) {
> > if (likely(free_to_pcs(s, object, true)))
> > return;
>
> --
> Cheers,
> Harry / Hyeonggon