Re: [PATCH v4 00/22] slab: replace cpu (partial) slabs with sheaves

From: Vlastimil Babka

Date: Wed Feb 04 2026 - 13:06:23 EST

On 1/30/26 05:50, Hao Li wrote:
> On Thu, Jan 29, 2026 at 04:28:01PM +0100, Vlastimil Babka wrote:
>>
>> So previously those would become kind of double
>> cached by both sheaves and cpu (partial) slabs (and thus hopefully benefited
>> more than they should) since sheaves introduction in 6.18, and now they are
>> not double cached anymore?
>>
>
> I've conducted new tests, and here are the details of three scenarios:
>
> 1. Checked out commit 9d4e6ab865c4, which represents the state before the
> introduction of the sheaves mechanism.
> 2. Tested with 6.19-rc5, which includes sheaves but does not yet apply the
> "sheaves for all" patchset.
> 3. Applied the "sheaves for all" patchset and also included the "avoid
> list_lock contention" patch.
>
>
> Results:
>
> For scenario 2 (with sheaves but without "sheaves for all"), there is a
> noticeable performance improvement compared to scenario 1:
>
> will-it-scale.128.processes +34.3%
> will-it-scale.192.processes +35.4%
> will-it-scale.64.processes +31.5%
> will-it-scale.per_process_ops +33.7%
>
> For scenario 3 (after applying "sheaves for all"), performance slightly
> regressed compared to scenario 1:
>
> will-it-scale.128.processes -1.3%
> will-it-scale.192.processes -4.2%
> will-it-scale.64.processes -1.2%
> will-it-scale.per_process_ops -2.1%
>
> Analysis:
>
> So when the sheaf size for maple nodes is set to 32 by default, the performance
> of fully adopting the sheaves mechanism roughly matches the performance of the
> previous approach that relied solely on the percpu slab partial list.
>
> The performance regression observed with the "sheaves for all" patchset can
> actually be explained as follows: moving from scenario 1 to scenario 2
> introduces an additional cache layer, which boosts performance temporarily.
> When moving from scenario 2 to scenario 3, this additional cache layer is
> removed, then performance reverted to its original level.
>
> So I think the performance of the percpu partial list and the sheaves mechanism
> is roughly the same, which is consistent with our expectations.

Thanks!