BUG Report: Fork benchmark drop by 30% on aarch64

From: Hagar Hemdan
Date: Wed Feb 05 2025 - 10:20:20 EST


Hi,

There is about a 30% drop in fork benchmark [1] on aarch64 and a 10%
drop on x86_64 using kernel v6.13.1.

Git bisect pointed to commit eff6c8ce8d4d ("sched/core: Reduce cost
of sched_move_task when config autogroup") which merged starting
v6.4-rc1.

The regression only happens when number of CPUs is equal to number
of threads [2] that fork test is creating which means it's only visible
under CPU contention.

I used m6g.xlarge AWS EC2 Instance with 4 vCPUs and 16 GiB RAM for ARM64
and m6a.xlarge with also 4 vCPUs and 16 GiB RAM for x86_64.

I noticed this regression exists only when autogroup config is enabled.

Run the fork test with these combinations and autogroup is enabled:

Arch | commit eff6c8ce8d4d | Fork Result (lps) | %Cpu(s)
----------+---------------------+--------------------+------------------
aarch64 | without | 28677.0 | 3.2 us, 96.7 sy
aarch64 | with | 19860.7 (30% drop) | 2.7 us, 79.4 sy
x86_64 | without | 27776.2 | 3.1 us, 96.9 sy
x86_64 | with | 25020.6 (10% drop) | 4.1 us, 93.2 sy
----------+---------------------+--------------------+------------------

It seems that the commit is capping the amount of CPU resources that can
be utilized leaving around 18% idle in case of aarch64 and 3% idle in
x86_64 case which is likely the main reason behind the reported fork
regression.

When autogroup is disabled:

Arch | commit eff6c8ce8d4d | Fork Result (lps) | %Cpu(s)
----------+---------------------+--------------------+------------------
aarch64 | without | 19877.8 | 2.2 us, 80.1 sy
aarch64 | with | 20086.3 (~same) | 1.9 us, 80.2 sy
x86_64 | without | 24974.2 | 4.9 us, 92.5 sy
x86_64 | with | 24921.5 (~same) | 4.9 us, 92.4 sy
----------+---------------------+--------------------+------------------

So when autogroup disabled, I still see the amount of idle CPU resources
18%, 3% on aarch64 and x86_64 regardless of commit.

Is this performance drop an expected of this commit when autogroup is
enabled?

Thanks,
Hagar

[1] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench
[2] Used command: ./Run -c 4 spawn