Re: [PATCH] sched/pelt: Fix task util_est update filtering

From: Vincent Donnefort
Date: Mon Feb 22 2021 - 04:32:36 EST


On Fri, Feb 19, 2021 at 11:19:05AM +0100, Dietmar Eggemann wrote:
> On 16/02/2021 17:39, vincent.donnefort@xxxxxxx wrote:
> > From: Vincent Donnefort <vincent.donnefort@xxxxxxx>
> >
> > Being called for each dequeue, util_est reduces the number of its updates
> > by filtering out when the EWMA signal is different from the task util_avg
> > by less than 1%. It is a problem for a sudden util_avg ramp-up. Due to the
> > decay from a previous high util_avg, EWMA might now be close enough to
> > the new util_avg. No update would then happen while it would leave
> > ue.enqueued with an out-of-date value.
>
> (1) enqueued[x-1] < ewma[x-1]
>
> (2) diff(enqueued[x], ewma[x]) < 1024/100 && enqueued[x] < ewma[x] (*)
>
> with ewma[x-1] == ewma[x]
>
> (*) enqueued[x] must still be less than ewma[x] w/ default
> UTIL_EST_FASTUP. Otherwise we would already 'goto done' (write the new
> util_est) via the previous if condition.
>
> >
> > Taking into consideration the two util_est members, EWMA and enqueued for
> > the filtering, ensures, for both, an up-to-date value.
> >
> > This is for now an issue only for the trace probe that might return the
> > stale value. Functional-wise, it isn't (yet) a problem, as the value is
> > always accessed through max(enqueued, ewma).
>
> Yeah, I remember that the ue.enqueued plots looked weird in these
> sections with stale ue.enqueued values.
>
> > This problem has been observed using LISA's UtilConvergence:test_means on
> > the sd845c board.
>
> I ran the test a couple of times on my juno board and I never hit this
> path (util_est_within_margin(last_ewma_diff) &&
> !util_est_within_margin(last_enqueued_diff)) for a test task.
>
> I can't see how this issue can be board specific? Does it happen
> reliably on sd845c or is it just that it happens very, very occasionally?

This is indeed not board specific. It just happened to be observed on that
one. And even then, it happens every once in a while.

>
> I saw it a couple of times but always with a (non-test) tasks migrating
> from one CPU to another.
>
> > Signed-off-by: Vincent Donnefort <vincent.donnefort@xxxxxxx>
>
> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@xxxxxxx>

Thanks!

>
> [...]