Re: [Regression] mm:slab/sheaves: severe performance regression in cross-CPU slab allocation

From: Vlastimil Babka (SUSE)

Date: Wed Feb 25 2026 - 08:26:19 EST


On 2/25/26 13:24, Ming Lei wrote:
> On Wed, Feb 25, 2026 at 12:29:26PM +0100, Vlastimil Babka (SUSE) wrote:
>> On 2/25/26 10:31, Ming Lei wrote:
>> > Hi Vlastimil,
>> >
>> > On Wed, Feb 25, 2026 at 09:45:03AM +0100, Vlastimil Babka (SUSE) wrote:
>> >> On 2/24/26 21:27, Vlastimil Babka wrote:
>> >> >
>> >> > It made sense to me not to refill sheaves when we can't reclaim, but I
>> >> > didn't anticipate this interaction with mempools. We could change them
>> >> > but there might be others using a similar pattern. Maybe it would be for
>> >> > the best to just drop that heuristic from __pcs_replace_empty_main()
>> >> > (but carefully as some deadlock avoidance depends on it, we might need
>> >> > to e.g. replace it with gfpflags_allow_spinning()). I'll send a patch
>> >> > tomorrow to test this theory, unless someone beats me to it (feel free to).
>> >> Could you try this then, please? Thanks!
>> >
>> > Thanks for working on this issue!
>> >
>> > Unfortunately the patch doesn't make a difference on IOPS in the perf test,
>> > follows the collected perf profile on linus tree(basically 7.0-rc1 with your patch):
>>
>> Hm that's weird, still the slowpath is prominent in your profile.
>>
>> I followed your reproducer instructions, although only with a small
>> virtme-ng based setup. What's the output of "numactl -H" on yours, btw?
>
> available: 8 nodes (0-7)
> node 0 cpus: 0 1 2 3 32 33 34 35
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 4 5 6 7 36 37 38 39
> node 1 size: 31906 MB
> node 1 free: 30572 MB
> node 2 cpus: 8 9 10 11 40 41 42 43
> node 2 size: 0 MB
> node 2 free: 0 MB
> node 3 cpus: 12 13 14 15 44 45 46 47
> node 3 size: 0 MB
> node 3 free: 0 MB
> node 4 cpus: 16 17 18 19 48 49 50 51
> node 4 size: 0 MB
> node 4 free: 0 MB
> node 5 cpus: 20 21 22 23 52 53 54 55
> node 5 size: 32135 MB
> node 5 free: 31086 MB
> node 6 cpus: 24 25 26 27 56 57 58 59
> node 6 size: 0 MB
> node 6 free: 0 MB
> node 7 cpus: 28 29 30 31 60 61 62 63
> node 7 size: 0 MB
> node 7 free: 0 MB
> node distances:
> node 0 1 2 3 4 5 6 7
> 0: 10 12 12 12 32 32 32 32
> 1: 12 10 12 12 32 32 32 32
> 2: 12 12 10 12 32 32 32 32
> 3: 12 12 12 10 32 32 32 32
> 4: 32 32 32 32 10 12 12 12
> 5: 32 32 32 32 12 10 12 12
> 6: 32 32 32 32 12 12 10 12
> 7: 32 32 32 32 12 12 12 10

Oh right, memory-less nodes, of course. Always so much fun.

>>
>> Anyway what I saw is my patch raised the IOPS substantially, and with
>> CONFIG_SLUB_STATS=y enabled I could see that
>> /sys/kernel/slab/bio-248/alloc_slowpath had substantial values before the
>> patch and zero afterwards.
>>
>> Maybe if you could also enable CONFIG_SLUB_STATS=y and see in which cache(s)
>> there's significant alloc_slowpath even after the patch, it could help.
>
> Patched:
>
> /sys/kernel/slab/bio-264
> ./alloc_slowpath:83555260 C0=33 C1=6717992 C2=9 C3=6611030 C8=128 C9=6802316 C11=6934363 C13=6721479 C14=66 C15=6694472 C16=96 C17=7286868 C18=128 C19=7369091 C24=128 C25=7288673 C26=51 C27=6800502 C28=129 C29=7095073 C31=7232628 C43=4 C56=1

Yean, no slowpath allocations from cpus that are *not* on a memoryless node.
Thanks, that will help to focus what to look at.

>
> Also config.tar.gz is attached.
>
> Thanks,
> Ming