Re: [PATCH] sched/fair: util_est: fast ramp-up EWMA on utilization increases

From: Patrick Bellasi
Date: Fri Jun 28 2019 - 06:08:24 EST

On 26-Jun 13:40, Vincent Guittot wrote:
> Hi Patrick,
> On Thu, 20 Jun 2019 at 17:06, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> >
> > The estimated utilization for a task is currently defined based on:
> > - enqueued: the utilization value at the end of the last activation
> > - ewma: an exponential moving average which samples are the enqueued values
> >
> > According to this definition, when a task suddenly change it's bandwidth
> > requirements from small to big, the EWMA will need to collect multiple
> > samples before converging up to track the new big utilization.
> >
> > Moreover, after the PELT scale invariance update [1], in the above scenario we
> > can see that the utilization of the task has a significant drop from the first
> > big activation to the following one. That's implied by the new "time-scaling"
> Could you give us more details about this? I'm not sure to understand
> what changes between the 1st big activation and the following one ?

We are after a solution for the problem Douglas Raillard discussed at
OSPM, specifically the "Task util drop after 1st idle" highlighted in
slide 6 of his presentation:

which shows what happens with a task switches from 5% to 75% and
we get these start/end values for each activation:

Act Time __comm __cpu __pid task util_avg
1 2.813559 <idle> 4 0 step_up 45
2.902624 step_up 4 2574 step_up 665
2 2.903722 <idle> 4 0 step_up 289
2.917385 step_up 4 2574 step_up 452
3 2.919725 <idle> 4 0 step_up 418
2.953764 step_up 4 2574 step_up 658
4 2.954248 <idle> 4 0 step_up 537
2.967955 step_up 4 2574 step_up 645
5 2.970248 <idle> 4 0 step_up 597
2.983914 step_up 4 2574 step_up 692
6 2.986248 <idle> 4 0 step_up 640
2.999924 step_up 4 2574 step_up 725
7 3.002248 <idle> 4 0 step_up 670
3.015872 step_up 4 2574 step_up 749
8 3.018248 <idle> 4 0 step_up 694
3.030474 step_up 4 2574 step_up 767
9 3.034247 <idle> 4 0 step_up 710
3.046454 step_up 4 2574 step_up 780

Since the first activation is running at lower-than-max OPPs we do
"time-scaling" at the end of the activation. Util_avg starts at 45
and ramps up to 665 but then it drops 375 units down to 289 at the
beginning of the second activation.

The second activation has a chance to run at higher OPPs, but still
not at max. Util_avg starts at 289 and ramps up to 452, which is even
lower then the previous max value, but then it drops 34 units down to

The following activations have a similar pattern but util_avg
converges toward the final value, we run almost always at the highest
OPP and the drops are defined mainly by the expected PELT decay.

> The utilization implied by new "time-scaling" should be the same as
> always running at max frequency with previous method

Right, the problem we are tacking with this patch however is to
make util_est a better signal for the ramp-up phases.

Right now util_est "fixes" only the second activation, since:

max(util_avg, last_value, ewma) =
max(289, 665, <289) = 665

and thus we keep running on the highest OPP we reached at the end of
the first activation.

While at the start of the third activation:

max(util_avg, last_value, ewma) =
max(452, 418, <452) = 452

and this time we drop the OPP quite a lot despite the signal still
being ramping up.

> > mechanisms instead of the previous "delta-scaling" approach.
> >

That happens because the EWMA takes multiple activations to converge
up, which means it's not very helping much:

> > Unfortunately, these drops cannot be fully absorbed by the current util_est
> > implementation. Indeed, the low-frequency filtering introduced by the "ewma" is
> > entirely useless while converging up and it does not help in stabilizing sooner
> > the PELT signal.

The idea of the patch is to exploit two observations:

1. the default scheduler behavior is to be performance oriented
2. the longher you run a task underprovisioned, the higher the
util_avg will be

Which turns into:

> > To make util_est do better service in the above scenario, do change its
> > definition to slow down only utilization decreases. Do that by resetting the
> > "ewma" every time the last collected sample increases.
> >
> > This change makes also the default util_est implementation more aligned with
> > the major scheduler behavior, which is to optimize for performance.
> > In the future, this implementation can be further refined to consider
> > task specific hints.


#include <best/regards.h>

Patrick Bellasi