Re: [PATCH v5 05/10] cpufreq/schedutil: get max utilization

From: Juri Lelli
Date: Mon May 28 2018 - 11:22:58 EST


On 28/05/18 16:57, Vincent Guittot wrote:
> Hi Juri,
>
> On 28 May 2018 at 12:12, Juri Lelli <juri.lelli@xxxxxxxxxx> wrote:
> > Hi Vincent,
> >
> > On 25/05/18 15:12, Vincent Guittot wrote:
> >> Now that we have both the dl class bandwidth requirement and the dl class
> >> utilization, we can use the max of the 2 values when agregating the
> >> utilization of the CPU.
> >>
> >> Signed-off-by: Vincent Guittot <vincent.guittot@xxxxxxxxxx>
> >> ---
> >> kernel/sched/sched.h | 6 +++++-
> >> 1 file changed, 5 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> >> index 4526ba6..0eb07a8 100644
> >> --- a/kernel/sched/sched.h
> >> +++ b/kernel/sched/sched.h
> >> @@ -2194,7 +2194,11 @@ static inline void cpufreq_update_util(struct rq *rq, unsigned int flags) {}
> >> #ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL
> >> static inline unsigned long cpu_util_dl(struct rq *rq)
> >> {
> >> - return (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> >> + unsigned long util = (rq->dl.running_bw * SCHED_CAPACITY_SCALE) >> BW_SHIFT;
> >
> > I'd be tempted to say the we actually want to cap to this one above
> > instead of using the max (as you are proposing below) or the
> > (theoretical) power reduction benefits of using DEADLINE for certain
> > tasks might vanish.
>
> The problem that I'm facing is that the sched_entity bandwidth is
> removed after the 0-lag time and the rq->dl.running_bw goes back to
> zero but if the DL task has preempted a CFS task, the utilization of
> the CFS task will be lower than reality and schedutil will set a lower
> OPP whereas the CPU is always running. The example with a RT task
> described in the cover letter can be run with a DL task and will give
> similar results.
> avg_dl.util_avg tracks the utilization of the rq seen by the scheduler
> whereas rq->dl.running_bw gives the minimum to match DL requirement.

Mmm, I see. Note that I'm only being cautious, what you propose might
work OK, but it seems to me that we might lose some of the benefits of
running tasks with DEADLINE if we start selecting frequency as you
propose even when such tasks are running.

An idea might be to copy running_bw util into dl.util_avg when a DL task
goes to sleep, and then decay the latter as for RT contribution. What
you think?