Re: [PATCH 0/7] introduce cpu.headroom knob to cpu controller

From: Song Liu
Date: Wed May 15 2019 - 11:45:02 EST


Hi Vincent,

> On May 15, 2019, at 3:18 AM, Vincent Guittot <vincent.guittot@xxxxxxxxxx> wrote:
>
> Hi Song,
>
> On Tue, 14 May 2019 at 22:58, Song Liu <songliubraving@xxxxxx> wrote:
>>
>> Hi Vincent,
>>
>
> [snip]
>
>>>
>>> Here are some more results with both Viresh's patch and the cpu.headroom
>>> set. In these tests, the side job runs with SCHED_IDLE, so we get benefit
>>> of Viresh's patch.
>>>
>>> We collected another metric here, average "cpu time" used by the requests.
>>> We also presented "wall time" and "wall - cpu" time. "wall time" is the
>>> same as "latency" in previous results. Basically, "wall time" includes cpu
>>> time, scheduling latency, and time spent waiting for data (from data base,
>>> memcache, etc.). We don't have good data that separates scheduling latency
>>> and time spent waiting for data, so we present "wall - cpu" time, which is
>>> the sum of the two. Time spent waiting for data should not change in these
>>> tests, so changes in "wall - cpu" mostly comes from scheduling latency.
>>> All the latency numbers are normalized based on the "wall time" of the
>>> first row.
>>>
>>> side job | cpu.headroom | cpu-idle | wall time | cpu time | wall - cpu
>>> ------------------------------------------------------------------------
>>> none | n/a | 42.4% | 1.00 | 0.31 | 0.69
>>> ffmpeg | 0 | 10.8% | 1.17 | 0.38 | 0.79
>>> ffmpeg | 25% | 22.8% | 1.08 | 0.35 | 0.73
>>>
>>> From these results, we can see that Viresh's patch reduces the latency
>>> overhead of the side job, from 42% (in previous results) to 17%. And
>>> a 25% cpu.headroom further reduces the latency overhead to 8%.
>>> cpu.headroom reduces time spent in "cpu time" and "wall - cpu" time,
>>> which means cpu.headroom yields better IPC and lower scheduling latency.
>>>
>>> I think these data demonstrate that
>>>
>>> 1. Viresh's work is helpful in reducing scheduling latency introduced
>>> by SCHED_IDLE side jobs.
>>> 2. cpu.headroom work provides mechanism to further reduce scheduling
>>> latency on top of Viresh's work.
>>>
>>> Therefore, the combination of the two work would give us mechanisms to
>>> control the latency overhead of side workloads.
>>>
>>> @Vincent, do these data and analysis make sense from your point of view?
>>
>> Do you have further questions/concerns with this set?
>
> Viresh's patchset takes into account CPU running only sched_idle task
> only for the fast wakeup path. But nothing special is (yet) done for
> the slow path or during idle load balance.
> The histogram that you provided for "Fallback to sched-idle CPU for
> better performance", shows that even if we have significantly reduced
> the long wakeup latency, there are still some wakeup latency evenly
> distributed in the range [16us-2msec]. Such values are most probably
> because of sched_other task doesn't always preempt sched_idle task and
> sometime waits for the next tick. This means that there are still
> margin for improving the results with sched_idle without adding a new
> knob.
> The headroom knob forces cpus to be idle from time to time and the
> scheduler fallbacks to the normal scheduling policy that tries to fill
> idle CPU in this case. I'm still not convinced that most of the
> increase of the latency increase is linked to contention when
> accessing shared resources.

I would like clarify that, we are not trying to convince that most of
the increase of the latency is from resource contention. Actually, we
also have data showing scheduling latency contributes more to the
latency overhead:

side job | cpu.headroom | cpu-idle | wall time | cpu time | wall - cpu
------------------------------------------------------------------------
none | n/a | 42.4% | 1.00 | 0.31 | 0.69
ffmpeg | 0 | 10.8% | 1.17 | 0.38 | 0.79
ffmpeg | 25% | 22.8% | 1.08 | 0.35 | 0.73

Compared against first row, second row shows 17% of latency overhead
(wall time). Among this 17%, 7% is in the "cpu time" column, which
is from resource contention (lower IPC). The other 10% (wall - cpu) is
mostly from scheduling latency. These experiments already have Viresh's
current patch. Scheduling latency contributes more in the overall
latency w/o Viresh's patch.

So we agree that, in this case, most of the increased latency comes
from scheduling latency.

However, we still think cpu.headroom would add value. The following
table shows comparison of ideal cases, where we totally eliminated the
overhead of scheduling latency.

side job | cpu.headroom | cpu-idle | wall time | cpu time | wall - cpu
------------------------------------------------------------------------
none | n/a | 42.4% | 1.00 | 0.31 | 0.69
------------------------------------------------------------------------
below are all from estimation, not from experiments
------------------------------------------------------------------------
ffmpeg | 0 | TBD | 1.07 | 0.38 | 0.69
ffmpeg | 25% | TBD | 1.04 | 0.35 | 0.69

We can see, cpu.headroom helps control latency even with ideal scheduling.
The saving here (from 7% overhead to 4%) is not as significant. But it is
still meaningful in some cases.

More important, cpu.headroom gives us mechanism to control the latency
overhead. It is a _protection_ mechanism, not some optimization. It is
somehow similar to current cpu.max knob, which limits the max cpu usage
of certain workload. cpu.headroom is more flexible than cpu.max, because
cpu.headroom could adjust the limits based on system load level dynamically.

Does this explanation make sense?

Thanks,
Song