Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases

From: Vincent Guittot
Date: Fri Jun 28 2019 - 09:52:04 EST


On Fri, 28 Jun 2019 at 14:38, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > On 26-Jun 13:40, Vincent Guittot wrote:
> > > Hi Patrick,
> > >
> > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> > > >
> > > > The estimated utilization for a task is currently defined based on:
> > > > - enqueued: the utilization value at the end of the last activation
> > > > - ewma: an exponential moving average which samples are the enqueued values
> > > >
> > > > According to this definition, when a task suddenly change it's bandwidth
> > > > requirements from small to big, the EWMA will need to collect multiple
> > > > samples before converging up to track the new big utilization.
> > > >
> > > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > > can see that the utilization of the task has a significant drop from the first
> > > > big activation to the following one. That's implied by the new "time-scaling"
> > >
> > > Could you give us more details about this? I'm not sure to understand
> > > what changes between the 1st big activation and the following one ?
> >
> > We are after a solution for the problem Douglas Raillard discussed at
> > OSPM, specifically the "Task util drop after 1st idle" highlighted in
> > slide 6 of his presentation:
> >
> > http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> >
>
> So I see the problem, and I don't hate the patch, but I'm still
> struggling to understand how exactly it related to the time-scaling
> stuff. Afaict the fundamental problem here is layering two averages. The

AFAICT, it's not related to the time-scaling

In fact the big 1st activation happens because task runs at low OPP
and hasn't enough time to finish its running phase before the time to
begin the next one happens. This means that the task will run several
computations phase in one go which is no more a 75% task. From a pelt
PoV, the task is far larger than a 75% task and its utilization too
because it runs far longer (even after scaling time with frequency).
Once cpu reaches a high enough OPP that enable to have sleep phase
between each running phases, the task load tracking comes back to the
normal slope increase (the one that would have happen if task would
have jump from 5% to 75% but already running at max OPP)

> second (EWMA in our case) will always lag/delay the input of the first
> (PELT).
>
> The time-scaling thing might make matters worse, because that helps PELT
> ramp up faster, but that is not the primary issue.
>
> Or am I missing something?