Re: [PATCH 2/2] sched/fair: Adjust the allowed NUMA imbalance when SD_NUMA spans multiple LLCs

From: K Prateek Nayak
Date: Wed Feb 09 2022 - 00:25:44 EST


Hello Mel,

On 2/8/2022 3:13 PM, Mel Gorman wrote:
[..snip..]
> On a Zen3 machine running STREAM parallelised with OMP to have on instance
> per LLC the results and without binding, the results are
>
> 5.17.0-rc0 5.17.0-rc0
> vanilla sched-numaimb-v6
> MB/sec copy-16 162596.94 ( 0.00%) 580559.74 ( 257.05%)
> MB/sec scale-16 136901.28 ( 0.00%) 374450.52 ( 173.52%)
> MB/sec add-16 157300.70 ( 0.00%) 564113.76 ( 258.62%)
> MB/sec triad-16 151446.88 ( 0.00%) 564304.24 ( 272.61%)

I was able to test STREAM without binding on different
NPS configurations of two socket Zen3 machine.

The results look good:

sched-tip - 5.17.0-rc1 tip sched/core
mel-v6 - 5.17.0-rc1 tip sched/core + this patch

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stream with 16 threads.
built with -DSTREAM_ARRAY_SIZE=128000000, -DNTIMES=10
Zen3, 64C128T per socket, 2 sockets,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NPS1

Test: sched-tip mel-v6
Copy: 114470.18 (0.00 pct) 152806.94 (33.49 pct)
Scale: 111575.12 (0.00 pct) 189784.57 (70.09 pct)
Add: 125436.15 (0.00 pct) 213371.05 (70.10 pct)
Triad: 123068.86 (0.00 pct) 209809.11 (70.48 pct)

NPS2

Test: sched-tip mel-v6
Copy: 57936.28 (0.00 pct) 155038.70 (167.60 pct)
Scale: 55599.30 (0.00 pct) 192601.59 (246.41 pct)
Add: 63096.96 (0.00 pct) 211462.58 (235.13 pct)
Triad: 61983.39 (0.00 pct) 208909.34 (237.04 pct)

NPS4

Test: sched-tip mel-v6
Copy: 43946.42 (0.00 pct) 119583.69 (172.11 pct)
Scale: 33750.96 (0.00 pct) 180130.83 (433.70 pct)
Add: 39109.72 (0.00 pct) 170296.68 (335.43 pct)
Triad: 36598.88 (0.00 pct) 169953.47 (364.36 pct)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Stream with 16 threads.
built with -DSTREAM_ARRAY_SIZE=128000000, -DNTIMES=100
Zen3, 64C128T per socket, 2 sockets,
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NPS1

Test: sched-tip mel-v6
Copy: 132402.79 (0.00 pct) 225587.85 (70.37 pct)
Scale: 126923.02 (0.00 pct) 214363.58 (68.89 pct)
Add: 145596.55 (0.00 pct) 260901.92 (79.19 pct)
Triad: 143092.91 (0.00 pct) 249081.79 (74.06 pct)

NPS 2

Test: sched-tip mel-v6
Copy: 107386.27 (0.00 pct) 227623.31 (111.96 pct)
Scale: 100941.44 (0.00 pct) 218116.63 (116.08 pct)
Add: 115854.52 (0.00 pct) 272756.95 (135.43 pct)
Triad: 113369.96 (0.00 pct) 260235.32 (129.54 pct)

NPS4

Test: sched-tip mel-v6
Copy: 91083.07 (0.00 pct) 247163.90 (171.36 pct)
Scale: 90352.54 (0.00 pct) 223914.31 (147.82 pct)
Add: 101973.98 (0.00 pct) 272842.42 (167.56 pct)
Triad: 99773.65 (0.00 pct) 258904.54 (159.49 pct)


There is a significant improvement throughout the board
with v6 outperforming tip/sched/core in every case!

Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>

--
Thanks and Regards
Prateek