Re: [PATCH 2/2] sched/fair: Always propagate runnable_load_avg

From: Vincent Guittot
Date: Thu Apr 27 2017 - 04:29:42 EST


On 27 April 2017 at 00:52, Tejun Heo <tj@xxxxxxxxxx> wrote:
> Hello,
>
> On Wed, Apr 26, 2017 at 08:12:09PM +0200, Vincent Guittot wrote:
>> On 24 April 2017 at 22:14, Tejun Heo <tj@xxxxxxxxxx> wrote:
>> Can the problem be on the load balance side instead ? and more
>> precisely in the wakeup path ?
>> After looking at the trace, it seems that task placement happens at
>> wake up path and if it fails to select the right idle cpu at wake up,
>> you will have to wait for a load balance which is alreayd too late
>
> Oh, I was tracing most of scheduler activities and the ratios of
> wakeups picking idle CPUs were about the same regardless of cgroup
> membership. I can confidently say that the latency issue that I'm
> seeing is from load balancer picking the wrong busiest CPU, which is
> not to say that there can be other problems.

ok. Is there any trace that you can share ? your behavior seems
different of mine

>
>> > another queued wouldn't report the correspondingly higher
>>
>> It will as load_avg includes the runnable_load_avg so whatever load is
>> in runnable_load_avg will be in load_avg too. But at the contrary,
>> runnable_load_avg will not have the blocked that is going to wake up
>> soon in the case of schbench
>
> Decaying contribution of blocked tasks don't affect the busiest CPU
> selection. Without cgroup, runnable_load_avg is immediately increased
> and decreased as tasks enter and leave the queue and otherwise we end
> up with CPUs which are idle when there are threads queued on different
> CPUs accumulating scheduling latencies.
>
> The patch doesn't change how the busiest CPU is picked. It already
> uses runnable_load_avg. The change that cgroup causes is that it
> blocks updates to runnable_load_avg from newly scheduled or sleeping
> tasks.
>
> The issue isn't about whether runnable_load_avg or load_avg should be
> used but the unexpected differences in the metrics that the load

I think that's the root of the problem. I explain a bit more my view
on the other thread

> balancer uses depending on whether cgroup is used or not.
>
>> One last thing, the load_avg of an idle CPU can stay blocked for a
>> while (until a load balance happens that will update blocked load) and
>> can be seen has "busy" whereas it is not. Could it be a reason of your
>> problem ?
>
> AFAICS, the load balancer doesn't use load_avg.
>
> Thanks.
>
> --
> tejun