Re: [PATCH v7 2/2] sched/fair: update scale invariance of PELT

From: Vincent Guittot
Date: Fri Jan 11 2019 - 10:05:04 EST


On Thu, 10 Jan 2019 at 16:30, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
>
> On 29-Nov 17:19, Vincent Guittot wrote:
> > On Thu, 29 Nov 2018 at 16:00, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> > > On 29-Nov 11:43, Vincent Guittot wrote:
>
> [...]
>
> > > Seems we agree that, when there is no idle time:
> > > - the two 15% tasks will be overestimated
> > > - their utilization will reach 50% after a while
> > >
> > > If I'm not wrong, we will have:
> > > - 30% CPU util in ~16ms @1024 capacity
> > > ~64ms @256 capacity
> > >
> > > Thus, the tasks will be certainly over-estimated after ~64ms.
> > > Is that correct ?
> >
> > From a pure util_avg pov it's correct
> > But i'd like to weight that a bit with the example below
> >
> > > Now, we can argue that 64ms is a pretty long time and thus it's quite
> > > unlucky we will have no idle for such a long time.
> > >
> > > Still, I'm wondering if we should keep collecting those samples or
> > > better find a way to detect that and skip the sampling.
> >
> > The problem is that you can have util_avg above capacity even with idle time
> > In the 1st example of this thread, the 39ms/80ms task will reach 709
> > which is the value saved by util_est on a big core
> > But on core with half capacity, there is still idle time so 709 is a
> > correct value although above 512
>
> Right, I see your point and (in principle) I like the idea of
> collecting samples for tasks which happen to run at a lower capacity
> then required and the utilization value makes sense...
>
> > In fact, max will be always above the linear ratio because it's based
> > on geometric series
> >
> > And this is true even with 15.6ms/32ms (same ratio as above) task
> > although the impact is smaller (max value, which should be saved by
> > util est, becomes 587 in this case).
>
> However that's not always the case... as per my example above.
>
> Moreover, we should also consider that util_est is mainly meant to be
> a lower-bound for tasks utilization.
> That's why task_util_est() already returns the actual util_avg when
> it's higher than the estimated utilization.

I can imagine that the fact that we use max(util_avg, util_est) helps
to keep using correct utilization in the scheduler when util_avg goes
above cpu capacity whereas there is still idle time

>
> With your new signal and without any special check on samples
> collection, if a task is limited because of thermal capping for
> example, we could end up overestimating its utilization and thus
> perhaps generating an unwanted frequency spike when the capping is
> relaxed... and (even worst) it will take some more activations for the
> estimated utilization to converge back to the actual utilization.
>
> Since we cannot easily know if there is idle time in a CPU when a task
> completes an activation with a utilization higher then the CPU
> capacity, I would better prefer to just skip the sampling with
> something like:
>
> ---8<---
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 9332863d122a..485053026533 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3639,6 +3639,7 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
> {
> long last_ewma_diff;
> struct util_est ue;
> + int cpu;
>
> if (!sched_feat(UTIL_EST))
> return;
> @@ -3672,6 +3673,14 @@ util_est_dequeue(struct cfs_rq *cfs_rq, struct task_struct *p, bool task_sleep)
> if (within_margin(last_ewma_diff, (SCHED_CAPACITY_SCALE / 100)))
> return;
>
> + /*
> + * To avoid overestimation of actual task utilization, skip updates if
> + * we cannot grant there is idle time in this CPU.
> + */
> + cpu = cpu_of(rq_of(cfs_rq));
> + if (task_util(p) > cpu_capacity(cpu))
> + return;
> +
> /*
> * Update Task's estimated utilization
> *
> ---8<---
>
> At least this will ensure that util_est always provides an actual
> measured lower bound for a task utilization.
>
> If you think this makes sense, feel free to add such a patch on
> top of your series.

ok. I'm going to add it when rebasing the series

Thanks
Vincent
>
> Cheers Patrick
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi