Re: [PATCH v5 2/2] sched/fair: update scale invariance of PELT

From: Vincent Guittot
Date: Wed Nov 07 2018 - 07:59:13 EST


On Wed, 7 Nov 2018 at 11:47, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>
> On 11/5/18 10:10 AM, Vincent Guittot wrote:
> > On Fri, 2 Nov 2018 at 16:36, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
> >>
> >> On 10/26/18 6:11 PM, Vincent Guittot wrote:
>
> [...]
>
> >> Thinking about this new approach on a big.LITTLE platform:
> >>
> >> CPU Capacities big: 1024 LITTLE: 512, performance CPUfreq governor
> >>
> >> A 50% (runtime/period) task on a big CPU will become an always running
> >> task on the little CPU. The utilization signal of the task and the
> >> cfs_rq of the little CPU converges to 1024.
> >>
> >> With contrib scaling the utilization signal of the 50% task converges to
> >> 512 on the little CPU, even it is always running on it, and so does the
> >> one of the cfs_rq.
> >>
> >> Two 25% tasks on a big CPU will become two 50% tasks on a little CPU.
> >> The utilization signal of the tasks converges to 512 and the one of the
> >> cfs_rq of the little CPU converges to 1024.
> >>
> >> With contrib scaling the utilization signal of the 25% tasks converges
> >> to 256 on the little CPU, even they run each 50% on it, and the one of
> >> the cfs_rq converges to 512.
> >>
> >> So what do we consider system-wide invariance? I thought that e.g. a 25%
> >> task should have a utilization value of 256 no matter on which CPU it is
> >> running?
> >>
> >> In both cases, the little CPU is not going idle whereas the big CPU does.
> >
> > IMO, the key point here is that there is no idle time. As soon as
> > there is no idle time, you don't know if a task has enough compute
> > capacity so you can't make difference between the 50% running task or
> > an always running task on the little core.
>
> Agreed. My '2 25% tasks on a 512 cpu' was a special example in the sense
> that the tasks would stay invariant since they are not restricted by the
> cpu capacity yet. '2 35% tasks' would also have 256 utilization each
> with contrib scaling so that's not invariant either.
>
> Could we say that in the overutilized case with contrib scaling each of
> the n tasks get cpu_cap/n utilization where with time scaling they get
> 1024/n utilization? Even though there is no value in this information
> because of the over-utilized state.
>
> > That's also interesting to noticed that the task will reach the always
> > running state after more than 600ms on little core with utilization
> > starting from 0.
> >
> > Then considering the system-wide invariance, the task are not really
> > invariant. If we take a 50% running task that run 40ms in a period of
> > 80ms, the max utilization of the task will be 721 on the big core and
> > 512 on the little core.
>
> Agreed, the utilization of the task on the big CPU oscillates between
> 721 and 321 so the average is still ~512.
>
> > Then, if you take a 39ms running task instead, the utilization on the
> > big core will reach 709 but it will be 507 on little core. So your
> > utilization depends on the current capacity.
>
> OK, but the average should be ~ 507 on big as well. There is idle time

I don't know about the average, the utilization varies between 709-292
on big core and between 507-486 in little core

> now even on the little CPU. But yeah, with longer period value, there
> are quite big amplitudes.
>
> > With the new proposal, the max utilization will be 709 on big and
> > little cores for the 39ms running task. For the 40ms running task, the
> > utilization will be 721 on big core. then if the task moves on the
> > little, it will reach the value 721 after 80ms, then 900 after more
> > than 160ms and 1000 after 320ms
>
> We consider max values here? In this case, agreed. So this is a reminder

Yes, we consider max value as it's what is mainly used especially with
util_est feature

> that even if the average utilization of a task compared to the CPU
> capacity would mark the system as non-overutilized (39ms/80ms on a 512
> CPU), the utilization of that task looks different because of the
> oscillation which is pretty noticeable with long periods.
>
> The important bit for EAS is that it only uses utilization in the
> non-overutilized case. Here, utilization signals should look the same
> between the two approaches, not considering tasks with long periods like
> the 39/80ms example above.
> There are also some advantages for EAS with time scaling: (1) faster
> overutilization detection when a big task runs on a little CPU, (2)
> higher (initial) task utilization value when this task migrates from
> little to big CPU.
>
> We should run our EAS task placement tests with your time scaling patches.