Re: [PATCH 4/4] sched/fair: Proportional newidle balance
From: Mario Roy
Date: Mon Jan 26 2026 - 23:16:07 EST
I tried the Stress-NG socket activity test. Plus prefer-idle-core patch.
The patch is about mindfulness for limited CPU saturation testing.
AMD Ryzen Threadripper 9960X CPU (24/48)
Bogo operations/second, More is better
A B C D E F
SocketAct 12128.7 13907.6 12377.7 10551.7 12158.7 11842.2
SocketAct24 64553.3 20072.0 67018.7 62182.3 18133.5 66756.6
SocketAct15 49206.3 22170.7 57038.7 44077.6 19884.1 56727.5
SocketAct10 35263.5 20140.3 40092.1 33040.3 19701.6 41346.3
The kernels are built with clang without LTO/AutoFDO
A. 6.19-rc7 next_buddy ena with sched/fair: Proportional newidle balance
B. 6.19-rc7 next_buddy ena without sched/fair: Proportional newidle balance
C. 6.19-rc7 next_buddy ena without sched regression; with prefer-idle-core
D. 6.19-rc7 next_buddy dis with sched/fair: Proportional newidle balance
E. 6.19-rc7 next_buddy dis without sched/fair: Proportional newidle balance
F. 6.19-rc7 next_buddy dis without sched regression; with prefer-idle-core
Without sched regression:
this is without sched/fair: Proportional newidle balance
With prefer-idle-core:
https://github.com/marioroy/cachymod/blob/main/linux-cachymod-6.18/
0280-prefer-prevcpu-for-wakeup.patch
Stress-NG 0.20.00: SocketAct, SocketAct24, SocketAct15, SocketAct10
stress-ng -t 30 --metrics-brief --sock -1 --no-rand-seed --sock-zerocopy
stress-ng -t 30 --metrics-brief --sock 24 --no-rand-seed --sock-zerocopy
stress-ng -t 30 --metrics-brief --sock 15 --no-rand-seed --sock-zerocopy
stress-ng -t 30 --metrics-brief --sock 10 --no-rand-seed --sock-zerocopy
Basically 100%, 50%, and 31.25% times 2 (writer, reader)
I ran also, --sock 10 because 10 x 2 is less than 50% (24 threads)
Linux 6.18.7 results: granted, both are built with LTO + AutoFDO profile
CachyOS 6.18.7-2 CachyMod 6.18.7-2 [1]
SocketAct 40799.2 46784.3
SocketAct24 61057.6 71414.5
SocketAct15 45056.4 61772.3
SocketAct10 32691.6 44244.6
[1] https://github.com/marioroy/cachymod
the sched regression reverted (0040 patch)
prefer-idle-core (0280 patch)
On 1/23/26 6:03 AM, Peter Zijlstra wrote:
On Fri, Jan 23, 2026 at 11:50:46AM +0100, Peter Zijlstra wrote:
On Sun, Jan 18, 2026 at 03:46:22PM -0500, Mario Roy wrote:Obviously I found it right after sending this. It's a 4x6 config.
The patch "Proportional newidle balance" introduced a regressionWhat is the actual configuration of that chip? Is it like 3*8 or 4*6
with Linux 6.12.65 and 6.18.5. There is noticeable regression with
easyWave testing. [1]
The CPU is AMD Threadripper 9960X CPU (24/48). I followed the source
to install easyWave [2]. That is fetching the two tar.gz archives.
(CCX wise). A quick google couldn't find me the answer :/
Meaning it needs newidle to balance between those 4 domains.
Pratheek -- are you guys still considering that SIS_NODE thing? That
worked really well for workstation chips, but there were some issues on
Epyc or so.
#!/bin/bashSo the problem is that 6.12 -> 6.18 is an enormous amount of kernel
# CXXFLAGS="-O3 $CXXFLAGS" ./configure
# make -j8
trap 'rm -f *.ssh *.idx *.log *.sshmax *.time' EXIT
OMP_NUM_THREADS=48 ./src/easywave \
-grid examples/e2Asean.grd -source examples/BengkuluSept2007.flt \
-time 1200
Before results with CachyOS 6.12.63-2 and 6.18.3-2 kernels.
releases :/ This patch in particular was an effort to fix a regression
caused by:
155213a2aed4 ("sched/fair: Bump sd->max_newidle_lb_cost when newidle balance fails")
I'm thinking that if you revert all 4 patches of this series your
performance will be even worse?
Anyway, my guess is that somehow this benchmark likes doing newidle even
if it is often not successful. I'll see if I can reproduce this on one
of my machine, but that might take a little while.