Re: [PATCH v5 2/2] sched/fair: update scale invariance of PELT

From: Peter Zijlstra
Date: Tue Nov 06 2018 - 10:00:11 EST


On Mon, Nov 05, 2018 at 02:58:54PM +0000, Morten Rasmussen wrote:

> It has always been debatable what to do with utilization when there are
> no spare cycles.
>
> In Dietmar's example where two 25% tasks are put on a 512 (50%) capacity
> CPU we add just enough utilization to have no spare cycles left. One
> could argue that 25% is still the correct utilization for those tasks.
> However, we only know their true utilization because they just ran
> unconstrained on a higher capacity CPU. Once they are on the 512 capacity
> CPU we wouldn't know if the tasks grew in utilization as there are no
> spare cycles to use.
>
> As I see it, the most fundamental difference between scaling
> contribution and time for PELT is the characteristics when CPUs are
> over-utilized.
>
> With contribution scaling the PELT utilization of a task is a _minimum_
> utilization. Regardless of where the task is currently/was running (and
> provided that it doesn't change behaviour) its PELT utilization will
> approximate its _minimum_ utilization on an idle 1024 capacity CPU.
>
> With time scaling the PELT utilization doesn't really have a meaning on
> its own. It has to be compared to the capacity of the CPU where it
> is/was running to know what the its current PELT utilization means. When
> the utilization over-shoots the capacity its value is no longer
> represents utilization, it just means that it has a higher compute
> demand than is offered on its current CPU and a high value means that it
> has been suffering longer. It can't be used to predict the actual
> utilization on an idle 1024 capacity any better than contribution scaled
> PELT utilization.
>
> This change might not be a showstopper, but it is something to be aware
> off and take into account wherever PELT utilization is used.

So for things like x86, where we don't have immediate control over the
OPPs nor actually know the current dynamic max OPP (or even can know), I
much prefer the model Vincent proposes.

The one thing we do know is the lack of idle time, and I feel equating
no idle with u=1 makes perfect sense.

Luckily x86 does provide means of querying the current effective OPP
and so a utilization value can be usefully compared to another.