Re: [PATCH] sched/topology: improve topology_span_sane speed

From: Steve Wahl
Date: Fri Nov 01 2024 - 16:06:01 EST


On Tue, Oct 29, 2024 at 11:04:52PM +0530, samir wrote:
>
> I have verified this patch on PowerPC and below are the results for "time
> ppc64_cpu —smt =off/4" mode,
> Here are the 5 iteration data for “time ppc64_cpu --smt=off/4” command(min,
> max, Average, and Std Dev).
>
> ——————Without patch——————
> ————uname -a————
> 6.12.0-rc5
>
> ————lscpu————
> lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 360
> On-line CPU(s) list: 0-359
> NUMA:
> NUMA node(s): 4
> NUMA node0 CPU(s): 0-95
> NUMA node1 CPU(s): 96-191
> NUMA node2 CPU(s): 192-271
> NUMA node3 CPU(s): 272-359
>
> Without Patch:
> Metric SMT Off (s) SMT 4 (s)
> Min 68.63 37.64
> Max 74.92 39.39
> Average 70.92 38.48
> Std Dev 2.22 0.63
>
>
> ——————With patch——————
> ————uname -a————
> 6.12.0-rc5-dirty
>
> ————lscpu————
> lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 360
> On-line CPU(s) list: 0-359
> NUMA:
> NUMA node(s): 4
> NUMA node0 CPU(s): 0-95
> NUMA node1 CPU(s): 96-191
> NUMA node2 CPU(s): 192-271
> NUMA node3 CPU(s): 272-359
>
> With Patch:
> Metric SMT Off (s) SMT 4 (s)
> Min 68.748 33.442
> Max 72.954 38.042
> Average 70.309 36.206
> Std Dev 1.41 1.66
>
> From the results it’s seen that there is no significant improvement,
> however, with the patch applied, the SMT=4 state shows a decrease in both
> average time, as reflected in the lower average (36.21s vs. 38.48s) and
> higher standard deviation (1.66s vs. 0.63s) compared to the previous without
> patch apply result.

Samir,

I found your results interesting. So I tried to compare with our
systems, and I get similar results. Around 300 processors, this patch
makes little difference. At higher counts, the topology_span_sane
function change has more influence.

I don't have PPC system access, but I tried to recreate similar
results on our x86_64 systems. I took a 8 socket, 60 core/socket, 2
thread/core system (960 CPUs), and limited it to 20 physical
cores/socket (320 CPUs) for comparison.

I'm using scripts from Intel's System Health Check,
"Set-Half-Of-The-Cores-Offline.sh" and "Set-All-Cores-Online.sh", but
similar results could be obtained with anything that manipulates
/sys/devices/system/cpu/cpu*/online.

I also found that the first offlining attempt after a reboot goes much
faster, so I threw out the first result after reboot and then measured
5 iterations. (The reason for this probably needs exploration, but it
happens for me on both patched and unpatched versions.)

All times in seconds.

With 20 cores / socket (320 CPUs counting hyperthreads):

Without patch:
Half-Offline All-Online
min 21.47 30.76
max 22.35 31.31
avg 22.04 31.124
std.dev. 0.3419795 0.2175545

With patch:
Half-Offline All-Online
min 20.43 28.23
max 21.93 29.76
avg 20.786 28.874
std.dev. 0.6435293 0.6366553

Not a huge difference at this level.

At 60 cores / socket (960 CPUs counting hyperthreads):

Without patch:
Half-Offline All-Online
min 275.34 321.47
max 288.05 331.89
avg 282.964 326.884
std.dev. 5.8835813 4.0268945

With patch:
Half-Offline All-Online
min 208.9 247.17
max 219.49 251.48
avg 212.392 249.394
std.dev. 4.1717586 1.6904526

Here it starts to make a difference, and as the number of CPUs goes
up, it gets worse.

I should note that I made my measurements with v2 of the patch,
recently posted. Version 2 does remove a memory allocation, which
might have improved things.

Thanks,

--> Steve Wahl

--
Steve Wahl, Hewlett Packard Enterprise