Re: [PATCH] sched/fair: schedutil: update only with all info available

From: Patrick Bellasi
Date: Wed Apr 11 2018 - 10:34:08 EST


On 11-Apr 13:56, Vincent Guittot wrote:
> On 11 April 2018 at 12:15, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> > On 11-Apr 08:57, Vincent Guittot wrote:
> >> On 10 April 2018 at 13:04, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> >> > On 09-Apr 10:51, Vincent Guittot wrote:
> >> >> On 6 April 2018 at 19:28, Patrick Bellasi <patrick.bellasi@xxxxxxx> wrote:
> >> >> Peter,
> >> >> what was your goal with adding the condition "if
> >> >> (rq->cfs.h_nr_running)" for the aggragation of CFS utilization
> >> >
> >> > The original intent was to get rid of sched class flags, used to track
> >> > which class has tasks runnable from within schedutil. The reason was
> >> > to solve some misalignment between scheduler class status and
> >> > schedutil status.
> >>
> >> This was mainly for RT tasks but it was not the case for cfs task
> >> before commit 8f111bc357aa
> >
> > True, but with his solution Peter has actually come up with a unified
> > interface which is now (and can be IMO) based just on RUNNABLE
> > counters for each class.
>
> But do we really want to only take care of runnable counter for all class ?

Perhaps, once we have PELT RT support with your patches we can
consider blocked utilization also for those tasks...

However, we can also argue that a policy where we trigger updates
based on RUNNABLE counters and then it's up to the schedutil policy to
decide for how long to ignore a frequency drop, using a step down
holding timer similar to what we already have, can also be a possible
solution.

I also kind-of see a possible interesting per-task tuning of such a
policy. Meaning that, for example, for certain tasks we wanna use a
longer throttling down scale time which can be instead shorter if only
"background" tasks are currently active.

> >> > The solution, initially suggested by Viresh, and finally proposed by
> >> > Peter was to exploit RQ knowledges directly from within schedutil.
> >> >
> >> > The problem is that now schedutil updated depends on two information:
> >> > utilization changes and number of RT and CFS runnable tasks.
> >> >
> >> > Thus, using cfs_rq::h_nr_running is not the problem... it's actually
> >> > part of a much more clean solution of the code we used to have.
> >>
> >> So there are 2 problems there:
> >> - using cfs_rq::h_nr_running when aggregating cfs utilization which
> >> generates a lot of frequency drop
> >
> > You mean because we now completely disregard the blocked utilization
> > where a CPU is idle, right?
>
> yes
>
> >
> > Given how PELT works and the recent support for IDLE CPUs updated, we
> > should probably always add contributions for the CFS class.
> >
> >> - making sure that the nr-running are up-to-date when used in sched_util
> >
> > Right... but, if we always add the cfs_rq (to always account for
> > blocked utilization), we don't have anymore this last dependency,
> > isn't it?
>
> yes
>
> >
> > We still have to account for the util_est dependency.
> >
> > Should I add a patch to this series to disregard cfs_rq::h_nr_running
> > from schedutil as you suggested?
>
> It's probably better to have a separate patch as these are 2 different topics
> - when updating cfs_rq::h_nr_running and when calling cpufreq_update_util
> - should we use runnable or running utilization for CFS

Yes, well... since OSPM is just next week, we can also have a better
discussion there and decide by then.

What is true so far is that using RUNNABLE is a change with respect to
the previous behaviors which unfortunately went unnoticed so far.

--
#include <best/regards.h>

Patrick Bellasi