Re: [PATCH v5 0/7] Add latency priority for CFS class
From: K Prateek Nayak
Date: Wed Oct 12 2022 - 10:54:02 EST
Hello Vincent,
Sharing results from testing on dual socket Zen3 system (2 x 64C/128T)
tl;dr
o I don't see any regression when workloads are running with
DEFAULT_LATENCY_NICE
o I can reproduce similar results as one reported in Patch 4 for
hackbench with latency nice 19 and hackbench and cyclictest
with various combination of latency nice values.
o I can see improvements to tail latency for schbench with hackbench
running in the background.
o There is an unexpected non-linear behavior observed for couple of
cases that I cannot explain yet. (Marked with "^" in detailed results)
I have not yet gotten to the bottom of it but if I've missed
something, please do let me know.
Detailed results are shared below:
On 9/25/2022 8:09 PM, Vincent Guittot wrote:
> This patchset restarts the work about adding a latency priority to describe
> the latency tolerance of cfs tasks.
>
> The patches [1-3] have been done by Parth:
> https://lore.kernel.org/lkml/20200228090755.22829-1-parth@xxxxxxxxxxxxx/
>
> I have just rebased and moved the set of latency priority outside the
> priority update. I have removed the reviewed tag because the patches
> are 2 years old.
>
> This aims to be a generic interface and the following patches is one use
> of it to improve the scheduling latency of cfs tasks.
>
> The patch [4] uses latency nice priority to define a latency offset
> and then decide if a cfs task can or should preempt the current
> running task. The patch gives some tests results with cyclictests and
> hackbench to highlight the benefit of latency priority for short
> interactive task or long intensive tasks.
>
> Patch [5] adds the support of latency nice priority to task group by
> adding a cpu.latency.nice field. The range is [-20:19] as for setting task
> latency priority.
>
> Patch [6] makes sched_core taking into account the latency offset.
>
> Patch [7] adds a rb tree to cover some corner cases where the latency
> sensitive task (priority < 0) is preempted by high priority task (RT/DL)
> or fails to preempt them. This patch ensures that tasks will have at least
> a slice of sched_min_granularity in priority at wakeup. The patch gives
> results to show the benefit in addition to patch 4.
>
> I have also backported the patchset on a dragonboard RB3 with an android
> mainline kernel based on v5.18 for a quick test. I have used the
> TouchLatency app which is part of AOSP and described to be a very good
> test to highlight jitter and jank frame sources of a system [1].
> In addition to the app, I have added some short running tasks waking-up
> regularly (to use the 8 cpus for 4 ms every 37777us) to stress the system
> without overloading it (and disabling EAS). The 1st results shows that the
> patchset helps to reduce the missed deadline frames from 5% to less than
> 0.1% when the cpu.latency.nice of task group are set.
>
> I have also tested the patchset with the modified version of the alsa
> latency test that has been shared by Tim. The test quickly xruns with
> default latency nice priority 0 but is able to run without underuns with
> a latency -20 and hackbench running simultaneously.
>
>
> [1] https://source.android.com/docs/core/debug/eval_perf#touchlatency
Following are the results from running standard benchmarks on a
dual socket Zen3 (2 x 64C/128T) machine configured in different
NPS modes.
NPS Modes are used to logically divide single socket into
multiple NUMA region.
Following is the NUMA configuration for each NPS mode on the system:
NPS1: Each socket is a NUMA node.
Total 2 NUMA nodes in the dual socket machine.
Node 0: 0-63, 128-191
Node 1: 64-127, 192-255
NPS2: Each socket is further logically divided into 2 NUMA regions.
Total 4 NUMA nodes exist over 2 socket.
Node 0: 0-31, 128-159
Node 1: 32-63, 160-191
Node 2: 64-95, 192-223
Node 3: 96-127, 223-255
NPS4: Each socket is logically divided into 4 NUMA regions.
Total 8 NUMA nodes exist over 2 socket.
Node 0: 0-15, 128-143
Node 1: 16-31, 144-159
Node 2: 32-47, 160-175
Node 3: 48-63, 176-191
Node 4: 64-79, 192-207
Node 5: 80-95, 208-223
Node 6: 96-111, 223-231
Node 7: 112-127, 232-255
Benchmark Results:
Kernel versions:
- tip: 5.19.0 tip sched/core
- latency_nice: 5.19.0 tip sched/core + this series
When we started testing, the tip was at:
commit 7e9518baed4c ("sched/fair: Move call to list_last_entry() in detach_tasks")
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ hackbench - DEFAULT_LATENCY_NICE ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NPS1
Test: tip latency_nice
1-groups: 4.23 (0.00 pct) 4.06 (4.01 pct)
2-groups: 4.93 (0.00 pct) 4.89 (0.81 pct)
4-groups: 5.32 (0.00 pct) 5.31 (0.18 pct)
8-groups: 5.46 (0.00 pct) 5.54 (-1.46 pct)
16-groups: 7.31 (0.00 pct) 7.33 (-0.27 pct)
NPS2
Test: tip latency_nice
1-groups: 4.19 (0.00 pct) 4.12 (1.67 pct)
2-groups: 4.77 (0.00 pct) 4.82 (-1.04 pct)
4-groups: 5.15 (0.00 pct) 5.17 (-0.38 pct)
8-groups: 5.47 (0.00 pct) 5.48 (-0.18 pct)
16-groups: 6.63 (0.00 pct) 6.65 (-0.30 pct)
NPS4
Test: tip latency_nice
1-groups: 4.23 (0.00 pct) 4.31 (-1.89 pct)
2-groups: 4.78 (0.00 pct) 4.75 (0.62 pct)
4-groups: 5.17 (0.00 pct) 5.24 (-1.35 pct)
8-groups: 5.63 (0.00 pct) 5.59 (0.71 pct)
16-groups: 7.88 (0.00 pct) 7.09 (10.02 pct)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ schbench - DEFAULT_LATENCY_NICE ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NPS1
#workers: tip latency_nice
1: 22.00 (0.00 pct) 21.00 (4.54 pct)
2: 34.00 (0.00 pct) 34.00 (0.00 pct)
4: 37.00 (0.00 pct) 40.00 (-8.10 pct)
8: 55.00 (0.00 pct) 49.00 (10.90 pct)
16: 69.00 (0.00 pct) 66.00 (4.34 pct)
32: 113.00 (0.00 pct) 117.00 (-3.53 pct)
64: 219.00 (0.00 pct) 242.00 (-10.50 pct) *
64: 219.00 (0.00 pct) 194.00 (11.41 pct) [Verification Run]
128: 506.00 (0.00 pct) 513.00 (-1.38 pct)
256: 45440.00 (0.00 pct) 44992.00 (0.98 pct)
512: 76672.00 (0.00 pct) 83328.00 (-8.68 pct)
NPS2
#workers: tip latency_nice
1: 31.00 (0.00 pct) 20.00 (35.48 pct)
2: 36.00 (0.00 pct) 28.00 (22.22 pct)
4: 45.00 (0.00 pct) 37.00 (17.77 pct)
8: 47.00 (0.00 pct) 51.00 (-8.51 pct)
16: 66.00 (0.00 pct) 69.00 (-4.54 pct)
32: 114.00 (0.00 pct) 113.00 (0.87 pct)
64: 215.00 (0.00 pct) 215.00 (0.00 pct)
128: 495.00 (0.00 pct) 529.00 (-6.86 pct) *
128: 495.00 (0.00 pct) 416.00 (15.95 pct) [Verification Run]
256: 48576.00 (0.00 pct) 46912.00 (3.42 pct)
512: 79232.00 (0.00 pct) 82560.00 (-4.20 pct)
NPS4
#workers: tip latency_nice
1: 30.00 (0.00 pct) 34.00 (-13.33 pct)
2: 34.00 (0.00 pct) 42.00 (-23.52 pct)
4: 41.00 (0.00 pct) 42.00 (-2.43 pct)
8: 60.00 (0.00 pct) 55.00 (8.33 pct)
16: 68.00 (0.00 pct) 69.00 (-1.47 pct)
32: 116.00 (0.00 pct) 115.00 (0.86 pct)
64: 224.00 (0.00 pct) 223.00 (0.44 pct)
128: 495.00 (0.00 pct) 677.00 (-36.76 pct) *
128: 495.00 (0.00 pct) 388.00 (21.61 pct) [Verification Run]
256: 45888.00 (0.00 pct) 44608.00 (2.78 pct)
512: 78464.00 (0.00 pct) 81536.00 (-3.91 pct)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ tbench - DEFAULT_LATENCY_NICE ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NPS1
Clients: tip latency_nice
1 550.66 (0.00 pct) 546.63 (-0.73 pct)
2 1009.69 (0.00 pct) 1016.40 (0.66 pct)
4 1795.32 (0.00 pct) 1773.95 (-1.19 pct)
8 2971.16 (0.00 pct) 2930.26 (-1.37 pct)
16 4627.98 (0.00 pct) 4727.82 (2.15 pct)
32 8065.15 (0.00 pct) 9019.11 (11.82 pct)
64 14994.32 (0.00 pct) 15100.22 (0.70 pct)
128 5175.73 (0.00 pct) 18223.69 (252.09 pct) *
128 20029.53 (0.00 pct) 20517.17 (2.43 pct) [Verification Run]
256 48763.57 (0.00 pct) 44463.63 (-8.81 pct)
512 43780.78 (0.00 pct) 44170.21 (0.88 pct)
1024 40341.84 (0.00 pct) 40883.10 (1.34 pct)
NPS2
Clients: tip latency_nice
1 551.06 (0.00 pct) 547.43 (-0.65 pct)
2 1000.76 (0.00 pct) 1014.83 (1.40 pct)
4 1737.02 (0.00 pct) 1742.30 (0.30 pct)
8 2992.31 (0.00 pct) 2951.59 (-1.36 pct)
16 4579.29 (0.00 pct) 4558.05 (-0.46 pct)
32 9120.73 (0.00 pct) 8122.06 (-10.94 pct) *
32 8814.62 (0.00 pct) 8965.54 (1.71 pct) [Verification Run]
64 14918.58 (0.00 pct) 14890.93 (-0.18 pct)
128 20830.61 (0.00 pct) 20410.48 (-2.01 pct)
256 47708.18 (0.00 pct) 45312.84 (-5.02 pct) *
256 44941.88 (0.00 pct) 44555.92 (-0.85 pct) [Verification Run]
512 43721.79 (0.00 pct) 43653.43 (-0.15 pct)
1024 40920.49 (0.00 pct) 41162.17 (0.59 pct)
NPS4
Clients: tip latency_nice
1 549.22 (0.00 pct) 539.81 (-1.71 pct)
2 1000.08 (0.00 pct) 1010.12 (1.00 pct)
4 1794.78 (0.00 pct) 1736.06 (-3.27 pct)
8 3008.50 (0.00 pct) 2952.68 (-1.85 pct)
16 4804.71 (0.00 pct) 4454.17 (-7.29 pct) *
16 4391.10 (0.00 pct) 4497.43 (2.42 pct) [Verification Run]
32 9156.57 (0.00 pct) 8820.05 (-3.67 pct)
64 14901.45 (0.00 pct) 14786.25 (-0.77 pct)
128 20771.20 (0.00 pct) 19955.11 (-3.92 pct)
256 47033.88 (0.00 pct) 44937.51 (-4.45 pct)
512 43429.01 (0.00 pct) 42638.81 (-1.81 pct)
1024 39271.27 (0.00 pct) 40044.17 (1.96 pct)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ stream - DEFAULT_LATENCY_NICE ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NPS1
10 Runs:
Test: tip latency_nice
Copy: 336311.52 (0.00 pct) 326015.98 (-3.06 pct)
Scale: 212955.82 (0.00 pct) 208667.27 (-2.01 pct)
Add: 251518.23 (0.00 pct) 237286.20 (-5.65 pct)
Triad: 262077.88 (0.00 pct) 258949.80 (-1.19 pct)
100 Runs:
Test: tip latency_nice
Copy: 339533.83 (0.00 pct) 335126.73 (-1.29 pct)
Scale: 194736.72 (0.00 pct) 221151.24 (13.56 pct)
Add: 218294.54 (0.00 pct) 251427.43 (15.17 pct)
Triad: 262371.40 (0.00 pct) 260100.85 (-0.86 pct)
NPS2
10 Runs:
Test: tip latency_nice
Copy: 335277.15 (0.00 pct) 339614.38 (1.29 pct)
Scale: 220990.24 (0.00 pct) 221052.78 (0.02 pct)
Add: 264156.13 (0.00 pct) 263684.19 (-0.17 pct)
Triad: 268707.53 (0.00 pct) 272610.96 (1.45 pct)
100 Runs:
Test: tip latency_nice
Copy: 334913.73 (0.00 pct) 339001.88 (1.22 pct)
Scale: 230522.47 (0.00 pct) 229848.86 (-0.29 pct)
Add: 264567.28 (0.00 pct) 264288.34 (-0.10 pct)
Triad: 272974.23 (0.00 pct) 272045.17 (-0.34 pct)
NPS4
10 Runs:
Test: tip latency_nice
Copy: 299432.31 (0.00 pct) 307649.18 (2.74 pct)
Scale: 217998.17 (0.00 pct) 205763.70 (-5.61 pct)
Add: 234305.46 (0.00 pct) 226381.75 (-3.38 pct)
Triad: 244369.15 (0.00 pct) 254225.30 (4.03 pct)
100 Runs:
Test: tip latency_nice
Copy: 344421.25 (0.00 pct) 322189.81 (-6.45 pct)
Scale: 237998.44 (0.00 pct) 227709.58 (-4.32 pct)
Add: 257501.82 (0.00 pct) 244009.58 (-5.23 pct)
Triad: 267686.50 (0.00 pct) 251840.25 (-5.91 pct)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Test cases for Latency Nice ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Note: Latency Nice might be referred to as LN in the data below. Latency Nice
value was set using a wrapper script for all the workload threads during the
testing.
All the test results reported below are for NPS1 configuration.
o Hackbench Pipes (100000 loops, threads)
Test: tip Latency Nice: -20 Latency Nice: 0 Latency Nice: 19
1-groups: 4.23 (0.00 pct) 4.39 (-3.78 pct) 3.99 (5.67 pct) 3.88 (8.27 pct)
2-groups: 4.93 (0.00 pct) 4.91 (0.40 pct) 4.69 (4.86 pct) 4.59 (6.89 pct)
4-groups: 5.32 (0.00 pct) 5.37 (-0.93 pct) 5.19 (2.44 pct) 5.05 (5.07 pct)
8-groups: 5.46 (0.00 pct) 5.90 (-8.05 pct) 5.34 (2.19 pct) 5.17 (5.31 pct)
16-groups: 7.31 (0.00 pct) 7.99 (-9.30 pct) 6.96 (4.78 pct) 6.51 (10.94 pct)
o Only Hackbench with different Latency Nice Values
> Loops: 100000
- Pipe (Process)
Test: Latency Nice: 0 Latency Nice: -20 Latency Nice: 19
1-groups: 3.77 (0.00 pct) 4.23 (-12.20 pct) 3.83 (-1.59 pct)
2-groups: 4.39 (0.00 pct) 4.73 (-7.74 pct) 4.31 (1.82 pct)
4-groups: 4.80 (0.00 pct) 5.07 (-5.62 pct) 4.68 (2.50 pct)
8-groups: 4.95 (0.00 pct) 5.68 (-14.74 pct) 4.76 (3.83 pct)
16-groups: 6.47 (0.00 pct) 7.87 (-21.63 pct) 6.08 (6.02 pct)
- Socket (Thread)
Test: Latency Nice: 0 Latency Nice: -20 Latency Nice: 19
1-groups: 6.08 (0.00 pct) 5.99 (1.48 pct) 6.08 (0.00 pct)
2-groups: 6.15 (0.00 pct) 6.25 (-1.62 pct) 6.14 (0.16 pct)
4-groups: 6.39 (0.00 pct) 6.42 (-0.46 pct) 6.44 (-0.78 pct)
8-groups: 8.51 (0.00 pct) 9.01 (-5.87 pct) 8.36 (1.76 pct)
16-groups: 12.48 (0.00 pct) 15.32 (-22.75 pct) 12.72 (-1.92 pct)
- Socket (Process)
Test: Latency Nice: 0 Latency Nice: -20 Latency Nice: 19
1-groups: 6.44 (0.00 pct) 5.50 (14.59 pct) ^ 6.43 (0.15 pct)
2-groups: 6.55 (0.00 pct) 5.56 (15.11 pct) ^ 6.36 (2.90 pct)
4-groups: 6.74 (0.00 pct) 6.19 (8.16 pct) ^ 6.69 (0.74 pct)
8-groups: 8.03 (0.00 pct) 8.29 (-3.23 pct) 8.02 (0.12 pct)
16-groups: 12.25 (0.00 pct) 14.11 (-15.18 pct) 12.41 (-1.30 pct)
> Loops: 2160 (Same as in testing)
- Pipe (Thread)
Test: Latency Nice: 0 Latency Nice: -20 Latency Nice: 19
1-groups: 0.10 (0.00 pct) 0.12 (-20.00 pct) 0.10 (0.00 pct)
2-groups: 0.12 (0.00 pct) 0.15 (-25.00 pct) 0.11 (8.33 pct)
4-groups: 0.14 (0.00 pct) 0.18 (-28.57 pct) 0.15 (-7.14 pct)
8-groups: 0.17 (0.00 pct) 0.24 (-41.17 pct) 0.17 (0.00 pct)
16-groups: 0.26 (0.00 pct) 0.33 (-26.92 pct) 0.21 (19.23 pct)
- Pipe (Process)
Test: Latency Nice: 0 Latency Nice: -20 Latency Nice: 19
1-groups: 0.10 (0.00 pct) 0.12 (-20.00 pct) 0.10 (0.00 pct)
2-groups: 0.12 (0.00 pct) 0.16 (-33.33 pct) 0.12 (0.00 pct)
4-groups: 0.14 (0.00 pct) 0.17 (-21.42 pct) 0.13 (7.14 pct)
8-groups: 0.16 (0.00 pct) 0.24 (-50.00 pct) 0.16 (0.00 pct)
16-groups: 0.23 (0.00 pct) 0.33 (-43.47 pct) 0.19 (17.39 pct)
- Socket (Thread)
Test: Latency Nice: 0 Latency Nice: -20 Latency Nice: 19
1-groups: 0.19 (0.00 pct) 0.18 (5.26 pct) 0.18 (5.26 pct)
2-groups: 0.21 (0.00 pct) 0.21 (0.00 pct) 0.20 (4.76 pct)
4-groups: 0.22 (0.00 pct) 0.25 (-13.63 pct) 0.22 (0.00 pct)
8-groups: 0.27 (0.00 pct) 0.36 (-33.33 pct) 0.27 (0.00 pct)
16-groups: 0.42 (0.00 pct) 0.55 (-30.95 pct) 0.40 (4.76 pct)
- Socket (Process)
Test: Latency Nice: 0 Latency Nice: -20 Latency Nice: 19
1-groups: 0.17 (0.00 pct) 0.17 (0.00 pct) 0.17 (0.00 pct)
2-groups: 0.19 (0.00 pct) 0.20 (-5.26 pct) 0.19 (0.00 pct)
4-groups: 0.20 (0.00 pct) 0.22 (-10.00 pct) 0.20 (0.00 pct)
8-groups: 0.25 (0.00 pct) 0.32 (-28.00 pct) 0.25 (0.00 pct)
16-groups: 0.40 (0.00 pct) 0.51 (-27.50 pct) 0.39 (2.50 pct)
o Hackbench and Cyclictest in NPS1 configuration
perf bench sched messaging -p -t -l 100000 -g 16&
cyclictest --policy other -D 5 -q -n -H 20000
-----------------------------------------------------------------------------------------------------------------
|Hackbench | Cyclictest LN = 19 | Cyclictest LN = 0 | Cyclictest LN = -20 |
|LN |--------------------------------|---------------------------------|-----------------------------|
|v | Min | Avg | Max | Min | Avg | Max | Min | Avg | Max |
|--------------|--------|---------|-------------|----------|---------|------------|----------|---------|--------|
|0 | 54.00 | 117.00 | 3021.67 | 53.67 | 65.33 | 133.00 | 53.67 | 65.00 | 201.33 | ^
|19 | 50.00 | 100.67 | 3099.33 | 41.00 | 64.33 | 1014.33 | 54.00 | 63.67 | 213.33 |
|-20 | 53.00 | 169.00 | 11661.67 | 53.67 | 217.33 | 14313.67 | 46.00 | 61.33 | 236.00 | ^
-----------------------------------------------------------------------------------------------------------------
o Hackbench and schbench in NPS1 configuration
perf bench sched messaging -p -t -l 1000000 -g 16&
schebcnh -m 1 -t 64 -s 30s
------------------------------------------------------------------------------------------------------------
|Hackbench | schbench LN = 19 | schbench LN = 0 | schbench LN = -20 |
|LN |----------------------------|--------------------------------|-----------------------------|
|v | 90th | 95th | 99th | 90th | 95th | 99th | 90th | 95th | 99th |
|--------------|--------|--------|----------|---------|---------|------------|---------|----------|--------|
|0 | 4264 | 6744 | 15664 | 17952 | 32672 | 55488 | 15088 | 25312 | 50112 |
|19 | 288 | 613 | 2332 | 274 | 1015 | 3628 | 374 | 1394 | 4424 |
|-20 | 35904 | 47680 | 79744 | 87168 | 113536 | 176896 | 13008 | 21216 | 42560 | ^
------------------------------------------------------------------------------------------------------------
o SpecJBB Multi-JVM
---------------------------------------------
| Latency Nice | 0 | 19 |
---------------------------------------------
| max-jOPS | 100% | 109.92% |
| critical-jOPS | 100% | 153.70% |
---------------------------------------------
In most cases, latency nice delivers what it promises.
Some cases marked with "^" have shown anomalies or non-linear behavior
that is yet to be root caused. If you've seen something similar during
your testing, I would love to know what could lead to such a behavior.
If you would like more details on the benchmarks results reported above
or if there is any specific workload you would like me to test on the
Zen3 machine, please do let me know.
>
> [..snip..]
>
--
Thanks and Regards,
Prateek
--
--
Thanks and Regards,
Prateek