Re: [PATCH v9 0/9] Add latency priority for CFS class
From: K Prateek Nayak
Date: Wed Dec 07 2022 - 11:27:39 EST
Hello Vincent,
Thank you for taking a look at the report.
On 11/28/2022 10:49 PM, Vincent Guittot wrote:
> Hi Prateek,
>
> On Mon, 28 Nov 2022 at 12:52, K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>>
>> Hello Vincent,
>>
>> Following are the test results on dual socket Zen3 machine (2 x 64C/128T)
>>
>> tl;dr
>>
>> o All benchmarks with DEFAULT_LATENCY_NICE value are comparable to tip.
>> There is, however, a noticeable dip for unixbench-spawn test case.
>>
>> o With the 2 rbtree approach, I do not see much difference in the
>> hackbench results with varying latency nice value. Tests on v5 did
>> yield noticeable improvements for hackbench.
>> (https://lore.kernel.org/lkml/cd48ebbb-9724-985f-28e3-e558dea07827@xxxxxxx/)
>
> The 2 rbtree approach is the one that was already used in v5. I just
> rerun hackbench tests with latest tip and v6.2-rc7 and I can see large
> performance improvement for pipe tests on my system (8 cores system).
> Could you try witha larger number of group ? like 64, 128 and 256
> groups
Ah! My bad. I've rerun hackbench with larger number of groups and I see a
clear win for pipes with latency nice 19. Hackbench with sockets too see a
small win.
o pipes
$ perf bench sched messaging -p -l 50000 -g <groups>
latency_nice: 0 19 -20
32-groups: 9.43 (0.00 pct) 6.42 (31.91 pct) 9.75 (-3.39 pct)
64-groups: 21.55 (0.00 pct) 12.97 (39.81 pct) 21.48 (0.32 pct)
128-groups: 41.15 (0.00 pct) 24.18 (41.23 pct) 46.69 (-13.46 pct)
256-groups: 78.87 (0.00 pct) 43.65 (44.65 pct) 78.84 (0.03 pct)
512-groups: 125.48 (0.00 pct) 78.91 (37.11 pct) 136.21 (-8.55 pct)
1024-groups: 292.81 (0.00 pct) 151.36 (48.30 pct) 323.57 (-10.50 pct)
o sockets
$ perf bench sched messaging -l 100000 -g <groups>
latency_nice: 0 19 -20
32-groups: 27.23 (0.00 pct) 27.00 (0.84 pct) 26.92 (1.13 pct)
64-groups: 45.71 (0.00 pct) 44.58 (2.47 pct) 45.86 (-0.32 pct)
128-groups: 79.55 (0.00 pct) 78.22 (1.67 pct) 80.01 (-0.57 pct)
256-groups: 161.41 (0.00 pct) 164.04 (-1.62 pct) 169.57 (-5.05 pct)
512-groups: 326.41 (0.00 pct) 310.00 (5.02 pct) 342.17 (-4.82 pct)
1024-groups: 634.36 (0.00 pct) 633.59 (0.12 pct) 640.05 (-0.89 pct)
Note: All tests were done in NPS1 mode.
>
>>
>> [..snip..]
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ~ Unixbench - DEFAULT_LATENCY_NICE ~
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> o NPS1
>>
>> Test Metric Parallelism tip latency_nice
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 48929419.48 ( 0.00%) 49137039.06 ( 0.42%)
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6275526953.25 ( 0.00%) 6265580479.15 ( -0.16%)
>> unixbench-syscall Amean unixbench-syscall-1 2994319.73 ( 0.00%) 3008596.83 * -0.48%*
>> unixbench-syscall Amean unixbench-syscall-512 7349715.87 ( 0.00%) 7420994.50 * -0.97%*
>> unixbench-pipe Hmean unixbench-pipe-1 2830206.03 ( 0.00%) 2854405.99 * 0.86%*
>> unixbench-pipe Hmean unixbench-pipe-512 326207828.01 ( 0.00%) 328997804.52 * 0.86%*
>> unixbench-spawn Hmean unixbench-spawn-1 6394.21 ( 0.00%) 6367.75 ( -0.41%)
>> unixbench-spawn Hmean unixbench-spawn-512 72700.64 ( 0.00%) 71454.19 * -1.71%*
>> unixbench-execl Hmean unixbench-execl-1 4723.61 ( 0.00%) 4750.59 ( 0.57%)
>> unixbench-execl Hmean unixbench-execl-512 11212.05 ( 0.00%) 11262.13 ( 0.45%)
>>
>> o NPS2
>>
>> Test Metric Parallelism tip latency_nice
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49271512.85 ( 0.00%) 49245260.43 ( -0.05%)
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6267992483.03 ( 0.00%) 6264951100.67 ( -0.05%)
>> unixbench-syscall Amean unixbench-syscall-1 2995885.93 ( 0.00%) 3005975.10 * -0.34%*
>> unixbench-syscall Amean unixbench-syscall-512 7388865.77 ( 0.00%) 7276275.63 * 1.52%*
>> unixbench-pipe Hmean unixbench-pipe-1 2828971.95 ( 0.00%) 2856578.72 * 0.98%*
>> unixbench-pipe Hmean unixbench-pipe-512 326225385.37 ( 0.00%) 328941270.81 * 0.83%*
>> unixbench-spawn Hmean unixbench-spawn-1 6958.71 ( 0.00%) 6954.21 ( -0.06%)
>> unixbench-spawn Hmean unixbench-spawn-512 85443.56 ( 0.00%) 70536.42 * -17.45%* (0.67% vs 0.93% - CoEff var)
>
> I don't expect any perf improvement or regression when the latency
> nice is not changed
This regression can be ignored. Although the results from back to
back runs are very stable, I see the results vary when I rebuild
the unixbench binaries on my test setup.
tip latency_nice
unixbench-spawn-512 73489.0 78260.4 (kexec)
unixbench-spawn-512 73332.7 77821.2 (reboot)
unixbench-spawn-512 86207.4 82281.2 (rebuilt + reboot)
I'll go back and look more into the spawn test because there is
something else at play there but other Unixbench results seem to
be stable looking at the rerun.
>
>> unixbench-execl Hmean unixbench-execl-1 4767.99 ( 0.00%) 4752.63 * -0.32%*
>> unixbench-execl Hmean unixbench-execl-512 11250.72 ( 0.00%) 11320.97 ( 0.62%)
>>
>> o NPS4
>>
>> Test Metric Parallelism tip latency_nice
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-1 49041932.68 ( 0.00%) 49156671.05 ( 0.23%)
>> unixbench-dhry2reg Hmean unixbench-dhry2reg-512 6286981589.85 ( 0.00%) 6285248711.40 ( -0.03%)
>> unixbench-syscall Amean unixbench-syscall-1 2992405.60 ( 0.00%) 3008933.03 * -0.55%*
>> unixbench-syscall Amean unixbench-syscall-512 7971789.70 ( 0.00%) 7814622.23 * 1.97%*
>> unixbench-pipe Hmean unixbench-pipe-1 2822892.54 ( 0.00%) 2852615.11 * 1.05%*
>> unixbench-pipe Hmean unixbench-pipe-512 326408309.83 ( 0.00%) 329617202.56 * 0.98%*
>> unixbench-spawn Hmean unixbench-spawn-1 7685.31 ( 0.00%) 7243.54 ( -5.75%)
>> unixbench-spawn Hmean unixbench-spawn-512 72245.56 ( 0.00%) 77000.81 * 6.58%*
>> unixbench-execl Hmean unixbench-execl-1 4761.42 ( 0.00%) 4733.12 * -0.59%*
>> unixbench-execl Hmean unixbench-execl-512 11533.53 ( 0.00%) 11660.17 ( 1.10%)
>>
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> ~ Hackbench - Various Latency Nice Values ~
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> o 100000 loops
>>
>> - pipe (process)
>>
>> Test: LN: 0 LN: 19 LN: -20
>> 1-groups: 3.91 (0.00 pct) 3.91 (0.00 pct) 3.81 (2.55 pct)
>> 2-groups: 4.48 (0.00 pct) 4.52 (-0.89 pct) 4.53 (-1.11 pct)
>> 4-groups: 4.83 (0.00 pct) 4.83 (0.00 pct) 4.87 (-0.82 pct)
>> 8-groups: 5.09 (0.00 pct) 5.00 (1.76 pct) 5.07 (0.39 pct)
>> 16-groups: 6.92 (0.00 pct) 6.79 (1.87 pct) 6.96 (-0.57 pct)
>>
>> - pipe (thread)
>>
>> 1-groups: 4.13 (0.00 pct) 4.08 (1.21 pct) 4.11 (0.48 pct)
>> 2-groups: 4.78 (0.00 pct) 4.90 (-2.51 pct) 4.79 (-0.20 pct)
>> 4-groups: 5.12 (0.00 pct) 5.08 (0.78 pct) 5.16 (-0.78 pct)
>> 8-groups: 5.31 (0.00 pct) 5.28 (0.56 pct) 5.33 (-0.37 pct)
>> 16-groups: 7.34 (0.00 pct) 7.27 (0.95 pct) 7.33 (0.13 pct)
>>
>> - socket (process)
>>
>> Test: LN: 0 LN: 19 LN: -20
>> 1-groups: 6.61 (0.00 pct) 6.38 (3.47 pct) 6.54 (1.05 pct)
>> 2-groups: 6.59 (0.00 pct) 6.67 (-1.21 pct) 6.11 (7.28 pct)
>> 4-groups: 6.77 (0.00 pct) 6.78 (-0.14 pct) 6.79 (-0.29 pct)
>> 8-groups: 8.29 (0.00 pct) 8.39 (-1.20 pct) 8.36 (-0.84 pct)
>> 16-groups: 12.21 (0.00 pct) 12.03 (1.47 pct) 12.35 (-1.14 pct)
>>
>> - socket (thread)
>>
>> Test: LN: 0 LN: 19 LN: -20
>> 1-groups: 6.50 (0.00 pct) 5.99 (7.84 pct) 6.02 (7.38 pct) ^
>> 2-groups: 6.07 (0.00 pct) 6.20 (-2.14 pct) 6.23 (-2.63 pct)
>> 4-groups: 6.61 (0.00 pct) 6.64 (-0.45 pct) 6.63 (-0.30 pct)
>> 8-groups: 8.87 (0.00 pct) 8.67 (2.25 pct) 8.78 (1.01 pct)
>> 16-groups: 12.63 (0.00 pct) 12.54 (0.71 pct) 12.59 (0.31 pct)
>>
>>> [..snip..]
>>>
>>
>> Apart from couple of anomalies, latency nice reduces wait time, especially
>> when the system is heavily loaded. If there is any data, or any specific
>> workload you would like me to run on the test system, please do let me know.
>> Meanwhile, I'll try to get some numbers for larger workloads like SpecJBB
>> that did see improvements with latency nice on v5.
Following are results for SpecJBB in NPS1 mode:
+----------------------------------------------+
| | Latency Nice | |
| Metric |-------------------| tip |
| | 0 | 19 | |
|----------------|-------------------|---------|
| Max jOPS | 100.00% | 102.19% | 101.02% |
| Criritcal jOPS | 100.00% | 122.41% | 100.41% |
+----------------------------------------------+
SpecJBB throughput for Max-jOPS is similar across the board
but Critical-jOPS throughput sees a good uplift again with
latency nice 19.
>
> [..snip..]
>
If there is any specific workload you would like me to test,
please do let me know. I'll try to test more workloads I come
across with different latency nice values and update you
with the results on this thread.
Tested-by: K Prateek Nayak <kprateek.nayak@xxxxxxx>
--
Thanks and Regards,
Prateek