Re: [RFC PATCH v2 4/6] sched/fair: Introduce an energy estimation helper function

From: Leo Yan
Date: Fri Apr 20 2018 - 12:28:13 EST


On Fri, Apr 20, 2018 at 03:42:45PM +0100, Quentin Perret wrote:
> Hi Leo,
>
> On Wednesday 18 Apr 2018 at 20:15:47 (+0800), Leo Yan wrote:
> > Sorry I introduce mess at here to spread my questions in several
> > replying, later will try to ask questions in one replying. Below are
> > more questions which it's good to bring up:
> >
> > The code for energy computation is quite neat and simple, but I think
> > the energy computation mixes two concepts for CPU util: one concept is
> > the estimated CPU util which is used to select CPU OPP in schedutil,
> > another concept is the raw CPU util according to CPU real running time;
> > for example, cpu_util_next() predicts CPU util but this value might be
> > much higher than cpu_util(), especially after enabled UTIL_EST feature
> > (I have shallow understanding for UTIL_EST so correct me as needed);
>
> I'm not not sure to understand what you mean by higher than cpu_util()
> here ... In which case would that happen ?

After UTIL_EST feature is enabled, cpu_util_next() returns higher value
than cpu_util(), see below code 'util = max(util, util_est);'; as
result cpu_util_next() takes consideration for extra compensention
introduced by UTIL_EST.

if (sched_feat(UTIL_EST)) {
util_est = READ_ONCE(cfs_rq->avg.util_est.enqueued);
if (dst_cpu == cpu)
util_est += _task_util_est(p);
else
util_est = max_t(long, util_est - _task_util_est(p), 0);
util = max(util, util_est);
}

> cpu_util_next() is basically used to figure out what will be the
> cpu_util() of CPU A after task p has been enqueued on CPU B (no matter
> what A and B are).

Same with upper description, cpu_util_next() is not the same thing
with cpu_util(), cpu_util_next() takes consideration for extra
compensention introduced by UTIL_EST.

> > but this patch simply computes CPU capacity and energy with the single
> > one CPU utilization value (and it will be an inflated value afte enable
> > UTIL_EST). Is this purposed for simple implementation?
> >
> > IMHO, cpu_util_next() can be used to predict CPU capacity, on the other
> > hand, should we use the CPU util without UTIL_EST capping for 'sum_util',
> > this can be more reasonable to reflect the CPU utilization?
>
> Why would a decayed utilisation be a better estimate of the time that
> a task is going to spend on a CPU ?

IIUC, in the scheduler waken up path task_util() is the task utilisation
before task sleeping, so it's not a decayed value. cpu_util() is
decayed value, but is this just we want to reflect cpu historic
utilisation at the recent past time? This is the reason I bring up to
use 'cpu_util() + task_util()' as estimation.

I understand this patch tries to use pre-decayed value, please review
below example has issue or not:
if one CPU's cfs_rq->avg.util_est.enqueued is quite high value, then this
CPU enter idle state and sleep for long while, if we use
cfs_rq->avg.util_est.enqueued to estimate CPU utilisation, this might
have big deviation than the CPU run time if place wake task on it? On
the other hand, cpu_util() can decay for CPU idle time...

> > Furthermore, if we consider RT thread is running on CPU and connect with
> > 'schedutil' governor, the CPU will run at maximum frequency, but we
> > cannot say the CPU has 100% utilization. The RT thread case is not
> > handled in this patch.
>
> Right, we don't account for RT tasks in the OPP prediction for now.
> Vincent's patches to have a util_avg for RT runqueues could help us
> do that I suppose ...

Good to know this.

> Thanks !
> Quentin