Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases

From: Patrick Bellasi
Date: Fri Jun 28 2019 - 10:01:03 EST


On 28-Jun 14:38, Peter Zijlstra wrote:
> On Fri, Jun 28, 2019 at 11:08:14AM +0100, Patrick Bellasi wrote:
> > On 26-Jun 13:40, Vincent Guittot wrote:
> > > Hi Patrick,
> > >
> > > On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> > > >
> > > > The estimated utilization for a task is currently defined based on:
> > > > - enqueued: the utilization value at the end of the last activation
> > > > - ewma: an exponential moving average which samples are the enqueued values
> > > >
> > > > According to this definition, when a task suddenly change it's bandwidth
> > > > requirements from small to big, the EWMA will need to collect multiple
> > > > samples before converging up to track the new big utilization.
> > > >
> > > > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > > > can see that the utilization of the task has a significant drop from the first
> > > > big activation to the following one. That's implied by the new "time-scaling"
> > >
> > > Could you give us more details about this? I'm not sure to understand
> > > what changes between the 1st big activation and the following one ?
> >
> > We are after a solution for the problem Douglas Raillard discussed at
> > OSPM, specifically the "Task util drop after 1st idle" highlighted in
> > slide 6 of his presentation:
> >
> > http://retis.sssup.it/ospm-summit/Downloads/02_05-Douglas_Raillard-How_can_we_make_schedutil_even_more_effective.pdf
> >
>
> So I see the problem, and I don't hate the patch, but I'm still
> struggling to understand how exactly it related to the time-scaling
> stuff. Afaict the fundamental problem here is layering two averages. The
> second (EWMA in our case) will always lag/delay the input of the first
> (PELT).
>
> The time-scaling thing might make matters worse, because that helps PELT
> ramp up faster, but that is not the primary issue.

Sure, we like the new time-scaling PELT which ramps up faster and, as
long as we have idle time, it's better in predicting what would be the
utilization as if we was running at max OPP.

However, the experiment above shows that:

- despite the task being a 75% after a certain activation, it takes
multiple activations for PELT to actually enter that range.

- the first activation ends at 665, 10% short wrt the configured
utilization

- while the PELT signal converge toward the 75%, we have some pretty
consistent drops at wakeup time, especially after the first big
activation.

> Or am I missing something?

I'm not sure the above happens because of a problem in the new
time-scaling PELT, I actually think it's kind of expected given the
way we re-scale time contributions depending on the current OPPs.

It's just that a 375 drops in utilization with just 1.1ms sleep time
looks to me more related to the time-scaling invariance then just the
normal/expected PELT decay.

Could it be an out-of-sync issue between the PELT time scaling code
and capacity scaling code?
Perhaps due to some OPP changes/notification going wrong?

Sorry for not being much more useful on that, maybe Vincent has some
better ideas.

The only thing I've kind of convinced myself is that an EWMA on
util_est does not make a lot of sense for increasing utilization
tracking.

Best,
Patrick

--
#include <best/regards.h>

Patrick Bellasi