Re: [PATCH 0/7] introduce cpu.headroom knob to cpu controller

From: Song Liu
Date: Wed Jun 26 2019 - 11:57:38 EST


Hi Michal,

> On Jun 26, 2019, at 1:26 AM, Michal Koutnà <mkoutny@xxxxxxxx> wrote:
>
> Hello Song and I apology for late reply.
>
> I understand the motivation for the headroom attribute is to achieve
> side load throttling before the CPU is fully saturated since your
> measurements show that something else gets saturated earlier than CPU
> and causes grow of the observed latency.
>
> The second aspect of the headroom knob, i.e. dynamic partitioning of the
> CPU resource is IMO something which we already have thanks to
> cpu.weight.

I think the cpu.headroom knob is the dynamic version of the cpu.max knob.
It serves different role as cpu.weight.

cpu.weight is like: when both tasks can run, which one gets more cycles.
cpu.headroom is like: even there is idle cpu cycle, the side workload
should not use it all.

>
> As you wrote, plain cpu.weight of workloads didn't work for you, so I
> think it'd be worth figuring out what is the resource whose saturation
> affects the overall observed latency and see if a protection/weights on
> that resource can be set (or implemented).

Our goal here is not to solve a particular case. Instead, we would like
a universal solution for different combination of main workload and side
workload. cpu.headroom makes it easy to adjust throttling based no the
requirement of the main workload.

Also, there are resources that could only be protected by intentionally
leave some idle cycles. For example, SMT siblings share ALUs, sometimes
we have to throttle one SMT sibling to make the other sibling run faster.

>
> On Tue, May 21, 2019 at 04:27:02PM +0000, Song Liu <songliubraving@xxxxxx> wrote:
>> The overall latency (or wall latency) contains:
>>
>> (1) cpu time, which is (a) and (d) in the loop above;
> How do you measure this CPU time? Does it include time spent in the
> kernel? (Or can there be anything else unaccounted for in the following
> calculations?)

We measures how much time a thread is running. It includes kernel time.
I think we didn't measure times spent on processing IRQs, but that is
small compared with overall latency.

>
>> (2) time waiting for data, which is (b);
> Is your assumption of this being constant supported by the measurements?

We don't measure that specifically. The data is fetched over the network
from other servers. The latency to fetch data is not constant, but the
average of thousands of requests should be the same for different cases.

>
> The last note is regarding semantics of the headroom knob, I'm not sure
> it fits well into the weight^allocation^limit^protection model. It seems
> to me that it's crafted to satisfy the division to one main workload and
> side workload, however, the concept doesn't generalize well to arbitrary
> number of siblings (e.g. two cgroups with same headroom, third with
> less, who is winning?).

The semantics is not very straightforward. We discussed about it for a
long time. And it is really crafted to protection model.

In your example, say both A and B have 30% headroom, and C has 20%. A and
B are "winning", as they will not be throttled. C will be throttled when
the global idleness is lower than 10% (30% - 20%).

Note that, this is not a typical use case for cpu.headroom. If multiple
latency sensitive applications are sharing the same server, they would
need some partition scheme.

Thanks,
Song