Re: [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves

From: Hao Li

Date: Thu Jan 29 2026 - 11:10:59 EST

On Thu, Jan 29, 2026 at 04:28:01PM +0100, Vlastimil Babka wrote:
> On 1/29/26 16:18, Hao Li wrote:
> > Hi Vlastimil,
> >
> > I conducted a detailed performance evaluation of the each patch on my setup.
>
> Thanks! What was the benchmark(s) used?

I'm currently using the mmap2 test case from will-it-scale. The machine is still
an AMD 2-socket system, with 2 nodes per socket, totaling 192 CPUs, with SMT
disabled. For each test run, I used 64, 128, and 192 processes respectively.

> Importantly, does it rely on vma/maple_node objects?

Yes, this test primarily puts a lot of pressure on maple_node.

> So previously those would become kind of double
> cached by both sheaves and cpu (partial) slabs (and thus hopefully benefited
> more than they should) since sheaves introduction in 6.18, and now they are
> not double cached anymore?

Exactly, since version 6.18, maple_node has indeed benefited from a dual-layer
cache.

I did wonder if this isn't a performance regression but rather the
performance returning to its baseline after removing one layer of caching.

However, verifying this idea would require completely disabling the sheaf
mechanism on version 6.19-rc5 while leaving the rest of the SLUB code untouched.
It would be great to hear any suggestions on how this might be approached.

>
> > During my tests, I observed two points in the series where performance
> > regressions occurred:
> >
> > Patch 10: I noticed a ~16% regression in my environment. My hypothesis is
> > that with this patch, the allocation fast path bypasses the percpu partial
> > list, leading to increased contention on the node list.
>
> That makes sense.
>
> > Patch 12: This patch seems to introduce an additional ~9.7% regression. I
> > suspect this might be because the free path also loses buffering from the
> > percpu partial list, further exacerbating node list contention.
>
> Hmm yeah... we did put the previously full slabs there, avoiding the lock.
>
> > These are the only two patches in the series where I observed noticeable
> > regressions. The rest of the patches did not show significant performance
> > changes in my tests.
> >
> > I hope these test results are helpful.
>
> They are, thanks. I'd however hope it's just some particular test that has
> these regressions,

Yes, I hope so too. And the mmap2 test case is indeed quite extreme.

> which can be explained by the loss of double caching.

If we could compare it with a version that only uses the
CPU partial list, the answer might become clearer.