BUG Report: Fork benchmark drop by 30% on aarch64
From: Hagar Hemdan
Date: Wed Feb 05 2025 - 10:20:20 EST
Hi,
There is about a 30% drop in fork benchmark [1] on aarch64 and a 10%
drop on x86_64 using kernel v6.13.1.
Git bisect pointed to commit eff6c8ce8d4d ("sched/core: Reduce cost
of sched_move_task when config autogroup") which merged starting
v6.4-rc1.
The regression only happens when number of CPUs is equal to number
of threads [2] that fork test is creating which means it's only visible
under CPU contention.
I used m6g.xlarge AWS EC2 Instance with 4 vCPUs and 16 GiB RAM for ARM64
and m6a.xlarge with also 4 vCPUs and 16 GiB RAM for x86_64.
I noticed this regression exists only when autogroup config is enabled.
Run the fork test with these combinations and autogroup is enabled:
Arch | commit eff6c8ce8d4d | Fork Result (lps) | %Cpu(s)
----------+---------------------+--------------------+------------------
aarch64 | without | 28677.0 | 3.2 us, 96.7 sy
aarch64 | with | 19860.7 (30% drop) | 2.7 us, 79.4 sy
x86_64 | without | 27776.2 | 3.1 us, 96.9 sy
x86_64 | with | 25020.6 (10% drop) | 4.1 us, 93.2 sy
----------+---------------------+--------------------+------------------
It seems that the commit is capping the amount of CPU resources that can
be utilized leaving around 18% idle in case of aarch64 and 3% idle in
x86_64 case which is likely the main reason behind the reported fork
regression.
When autogroup is disabled:
Arch | commit eff6c8ce8d4d | Fork Result (lps) | %Cpu(s)
----------+---------------------+--------------------+------------------
aarch64 | without | 19877.8 | 2.2 us, 80.1 sy
aarch64 | with | 20086.3 (~same) | 1.9 us, 80.2 sy
x86_64 | without | 24974.2 | 4.9 us, 92.5 sy
x86_64 | with | 24921.5 (~same) | 4.9 us, 92.4 sy
----------+---------------------+--------------------+------------------
So when autogroup disabled, I still see the amount of idle CPU resources
18%, 3% on aarch64 and x86_64 regardless of commit.
Is this performance drop an expected of this commit when autogroup is
enabled?
Thanks,
Hagar
[1] https://github.com/kdlucas/byte-unixbench/blob/master/UnixBench
[2] Used command: ./Run -c 4 spawn