Re: [PATCH v2 00/23] Cache aware scheduling
From: Chen, Yu C
Date: Fri Dec 19 2025 - 08:04:41 EST
On 12/19/2025 11:19 AM, Aaron Lu wrote:
On Wed, Dec 03, 2025 at 03:07:19PM -0800, Tim Chen wrote:
... ...
Test results:
The patch series was applied and tested on v6.18-rc7.
See: https://github.com/timcchen1298/linux/commits/cache_aware_v2
The first test platform is a 2 socket Intel Sapphire Rapids with 30
cores per socket. The DRAM interleaving is enabled in the BIOS so it
essential has one NUMA node with two last level caches. There are 60
CPUs associated with each last level cache.
The second test platform is a AMD Genoa. There are 4 Nodes and 32 CPUs
per node. Each node has 2 CCXs and each CCX has 16 CPUs.
hackbench/schbench/netperf/stream/stress-ng/chacha20 were launched on
these two platforms.
[TL;DR]
Sappire Rapids:
hackbench shows significant improvement when the number of
different active threads is below the capacity of a LLC.
schbench shows overall wakeup latency improvement.
ChaCha20-xiangshan shows good throughput improvement.
Genoa:
ChaCha20-xiangshan shows huge throughput improvement.
No obvious difference is observed in hackbench/schbench
I think for small task number hackbench run, there should be some
improvement.
I tried thread/pipe/2fds/1group, i.e. 4 tasks on Genoa:
./hackbench -T -f 2 -g 1 -p -l 2000000
And I noticed performance improved a lot:
(Result in seconds, less is better)
llc_off llc_on diff
time 4.755±1.6% 2.684±6.25% +43.6%
llc_off means /sys/kernel/debug/sched/llc_enabled set to 0 while
llc_on means /sys/kernel/debug/sched/llc_enabled set to 1, other
tunnables are left unchanged.
Turbo is disabled and cpufreq set to performance.
I also tried redis and noticed when I set io-threads to 4 in redis.conf,
there is also some improvement on AMD Genoa:
llc_off manual diff llc_on diff
throughput 1536727±0% 1737619±0% +13.1% 1737720±0% +13.1%
Client cmdline:
numactl -N 1 redis-benchmark --threads 4 -t set -r 100000 -P 16 -n 10000000
server cmdline: numactl -N 0 redis-server ./redis.conf
I also tried to manually bind all tasks of redis server to a single LLC
to see if this workload benefits from aggregation and that's what manual
means: taskset -c 8-15,200-207 redis-server ./redis.conf
According to the result, I think this 'cache aware scheduling' works
as expected in that its performance is the same as manual binding; and
they all beat llc_off.
Thanks Aaron for the test!
thanks,
Chenyu