Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization

From: Patrick Bellasi
Date: Thu May 31 2018 - 06:27:47 EST



Hi Vincent, Juri,

On 28-May 18:34, Vincent Guittot wrote:
> On 28 May 2018 at 17:22, Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:
> > On 28/05/18 16:57, Vincent Guittot wrote:
> >> Hi Juri,
> >>
> >> On 28 May 2018 at 12:12, Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:
> >> > Hi Vincent,
> >> >
> >> > On 25/05/18 15:12, Vincent Guittot wrote:
> >> >> Now that we have both the dl class bandwidth requirement and the dl class
> >> >> utilization, we can use the max of the 2 values when agregating the
> >> >> utilization of the CPU.
> >> >>
> >> >> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >> >> ---
> >> >> kernel/sched/sched.h | 6 +++++-
> >> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >> >>
> >> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> >> index 4526ba6..0eb07a8 100644
> >> >> --- a/kernel/sched/sched.h
> >> >> +++ b/kernel/sched/sched.h
> >> >> @@ -2194,7 +2194,11 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> >> >> #ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
> >> >> static inline unsigned long cpu_util_dl(struct rq *rq)
> >> >> {
> >> >> - return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> >> >> + unsigned long util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> >> >
> >> > I'd be tempted to say the we actually want to cap to this one above
> >> > instead of using the max (as you are proposing below) or the
> >> > (theoretical) power reduction benefits of using DEADLINE for certain
> >> > tasks might vanish.
> >>
> >> The problem that I'm facing is that the sched_entity bandwidth is
> >> removed after the 0-lag time and the rq->dl.running_bw goes back to
> >> zero but if the DL task has preempted a CFS task, the utilization of
> >> the CFS task will be lower than reality and schedutil will set a lower
> >> OPP whereas the CPU is always running.

With UTIL_EST enabled I don't expect an OPP reduction below the
expected utilization of a CFS task.

IOW, when a periodic CFS task is preempted by a DL one, what we use
for OPP selection once the DL task is over is still the estimated
utilization for the CFS task itself. Thus, schedutil will eventually
(since we have quite conservative down scaling thresholds) go down to
the right OPP to serve that task.

> >> The example with a RT task described in the cover letter can be
> >> run with a DL task and will give similar results.

In the cover letter you says:

A rt-app use case which creates an always running cfs thread and a
rt threads that wakes up periodically with both threads pinned on
same CPU, show lot of frequency switches of the CPU whereas the CPU
never goes idles during the test.

I would say that's a quite specific corner case where your always
running CFS task has never accumulated a util_est sample.

Do we really have these cases in real systems?

Otherwise, it seems to me that we are trying to solve quite specific
corner cases by adding a not negligible level of "complexity".

Moreover, I also have the impression that we can fix these
use-cases by:

- improving the way we accumulate samples in util_est
i.e. by discarding preemption time

- maybe by improving the utilization aggregation in schedutil to
better understand DL requirements
i.e. a 10% utilization with a 100ms running time is way different
then the same utilization with a 1ms running time


--
#include <best/regards.h>

Patrick Bellasi