Re: [PATCH v5 0/7] Add latency priority for CFS class

From: Vincent Guittot
Date: Thu Oct 27 2022 - 12:35:13 EST


Hi Prateek,

On Tue, 25 Oct 2022 at 08:36, K Prateek Nayak <kprateek.nayak@xxxxxxx> wrote:
>
> Hello Vincent,
>
> I've rerun some tests with a different configuration with more
> contention for CPU and I can see a linear behavior. Sharing the
> results below.
>
> On 10/13/2022 8:54 PM, Vincent Guittot wrote:
> >
> > [..snip..]
> >>
> >> o Hackbench and Cyclictest in NPS1 configuration
> >>
> >> perf bench sched messaging -p -t -l 100000 -g 16&
> >> cyclictest --policy other -D 5 -q -n -H 20000
> >>
> >> -----------------------------------------------------------------------------------------------------------------
> >> |Hackbench | Cyclictest LN = 19 | Cyclictest LN = 0 | Cyclictest LN = -20 |
> >> |LN |--------------------------------|---------------------------------|-----------------------------|
> >> |v | Min | Avg | Max | Min | Avg | Max | Min | Avg | Max |
> >> |--------------|--------|---------|-------------|----------|---------|------------|----------|---------|--------|
> >> |0 | 54.00 | 117.00 | 3021.67 | 53.67 | 65.33 | 133.00 | 53.67 | 65.00 | 201.33 | ^
> >> |19 | 50.00 | 100.67 | 3099.33 | 41.00 | 64.33 | 1014.33 | 54.00 | 63.67 | 213.33 |
> >> |-20 | 53.00 | 169.00 | 11661.67 | 53.67 | 217.33 | 14313.67 | 46.00 | 61.33 | 236.00 | ^
> >> -----------------------------------------------------------------------------------------------------------------
> >
> > The latency results look good with Cyclictest LN:0 and hackbench LN:0.
> > 133us max latency. This suggests that your system is not overloaded
> > and cyclictest doesn't really compete with others to run.
>
> Following is the result of running cyclictest alongside hackbench with 32 groups:
>
> perf bench sched messaging -p -l 100000 -g 32&
> cyclictest --policy other -D 5 -q -n -H 20000
>
> ----------------------------------------------------------------------------------------------------------
> | Hackbench | Cyclictest LN = 19 | Cyclictest LN = 0 | Cyclictest LN = -20 |
> | LN |------------------------------|-------------------------------|---------------------------|
> | | Min | Avg | Max | Min | Avg | Max | Min | Avg | Max |
> |-------------|--------|---------|-----------|--------|---------|------------|--------|-------|----------|
> | 0 | 54.00 | 165.00 | 6899.00 | 22.00 | 85.00 | 3294.00 | 23.00 | 64.00 | 276.00 |
> | 19 | 53.00 | 173.00 | 3275.00 | 40.00 | 60.00 | 2276.00 | 13.00 | 59.00 | 94.00 |
> | -20 | 52.00 | 293.00 | 19980.00 | 52.00 | 280.00 | 14305.00 | 53.00 | 95.00 | 5713.00 |
> ----------------------------------------------------------------------------------------------------------
>
> I see a spike for Max in (0, 0) configuration and the latency decreases
> monotonically with lower latency nice value.

Your results looks good

>
> >
> >>
> >> o Hackbench and schbench in NPS1 configuration
> >>
> >> perf bench sched messaging -p -t -l 1000000 -g 16&
> >> schebcnh -m 1 -t 64 -s 30s
> >>
> >> ------------------------------------------------------------------------------------------------------------
> >> |Hackbench | schbench LN = 19 | schbench LN = 0 | schbench LN = -20 |
> >> |LN |----------------------------|--------------------------------|-----------------------------|
> >> |v | 90th | 95th | 99th | 90th | 95th | 99th | 90th | 95th | 99th |
> >> |--------------|--------|--------|----------|---------|---------|------------|---------|----------|--------|
> >> |0 | 4264 | 6744 | 15664 | 17952 | 32672 | 55488 | 15088 | 25312 | 50112 |
> >> |19 | 288 | 613 | 2332 | 274 | 1015 | 3628 | 374 | 1394 | 4424 |
> >> |-20 | 35904 | 47680 | 79744 | 87168 | 113536 | 176896 | 13008 | 21216 | 42560 | ^
> >> ------------------------------------------------------------------------------------------------------------
> >
> > For the schbench, your test is 30 seconds long which is longer than
> > the duration of perf bench sched messaging -p -t -l 1000000 -g 16&
> >
> > The duration of the latter varies depending of latency nice value so
> > schbench is disturb more time in some cases
>
> I've rerun this with hackbench running 128 groups alongside schbench
> with 2 messenger and 1 worker each. With larger worker count, I still
> see non-monotonic behavior in 99th percentile latency of schbench.
> I also see number of latency samples collected by schbench to vary
> over the 30 second run for different latency nice values which could
> also pay a part in seeing the unexpected behavior. For lower worker
> count, I see the number of samples collected is similar. Following
> is the configuration and the latency reported by schbench:
>
> perf bench sched messaging -p -t -l 150000 -g 128&
> schbench -m 2 -t 1 -s 30s
>
> Note: In all cases, hackbench runs longer than schbench.
>
> -------------------------------------------------------------------------------------------------
> | Hackbench | schbench LN = 19 | schbench LN = 0 | schbench LN = -20 |
> | LN |----------------------------|---------------------------|--------------------------|
> | | 90th | 95th | 99th | 90th | 95th | 99th | 90th | 95th | 99th |
> |-----------|--------|--------|----------|--------|--------|---------|--------|--------|--------|
> | 0 | 42 | 92 | 2972 | 26 | 49 | 2356 | 9 | 11 | 20 |
> | 19 | 35 | 424 | 4984 | 13 | 390 | 5096 | 8 | 10 | 14 | ^
> | -19 | 144 | 3516 | 110208 | 61 | 807 | 34880 | 25 | 39 | 295 |
> -------------------------------------------------------------------------------------------------
>
> I see 90th and 95th percentile latency decrease monotonically with
> latency nice value of schbench (for a fixed latency nice value of
> hackbench) but there are cases where 99th percentile latency
> reported by schbench may not strictly decrease with lower latency
> nice value (Marked with ^)
>
> Note: Only a small number of bad samples can affect the 99th
> percentile latency for the above configuration. The monotonic
> behavior in 90th and 95th percentile latency is a good data point
> to show latency nice is indeed working as expected.

Yes, I think you are right that the 99th percentile is not stable
enough because it can be impacted by a small number of bad samples

>
> If there is any specific workload you would like me to run on the
> test system, or any additional data you would like for above
> workloads, please do let me know.

Thanks a lot for your tests.
I'm about to send v6

>
> --
> Thanks and Regards,
> Prateek