Re: [PATCH] cpufreq: schedutil: add up/down frequency transition rate limits
From: Peter Zijlstra
Date: Mon Nov 21 2016 - 10:26:22 EST
On Mon, Nov 21, 2016 at 02:59:19PM +0000, Patrick Bellasi wrote:
> A fundamental problem in IMO is that we are trying to use a "dynamic
> metric" to act as a "predictor".
> PELT is a "dynamic metric" since it continuously change while a task
> is running. Thus it does not really provides an answer to the question
> "how big this task is?" _while_ the task is running.
> Such an information is available only when the task sleep.
> Indeed, only when the task completes an activation and goes to sleep
> PELT has reached a value which represents how much CPU bandwidth has
> been required by that task.
I'm not sure I agree with that. We can only tell how big a task is
_while_ its running, esp. since its behaviour is not steady-state. Tasks
can change etc..
Also, as per the whole argument on why peak_util was bad, at the moment
a task goes to sleep, the PELT signal is actually an over-estimate,
since it hasn't yet had time to average out.
And a real predictor requires a crytal-ball instruction, but until such
time that hardware people bring us that goodness, we'll have to live
with predicting the near future based on the recent past.
> For example, if we consider the simple yet interesting case of a
> periodic task, PELT is a wobbling signal which reports a correct
> measure of how much bandwidth is required only when a task completes
> its RUNNABLE status.
Its actually an over-estimate at that point, since it just added a
sizable chunk to the signal (for having been runnable) that hasn't yet
had time to decay back to the actual value.
> To be more precise, the correct value is provided by the average PELT
> and this also depends on the period of the task compared to the
> PELT rate constant.
> But still, to me a fundamental point is that the "raw PELT value" is
> not really meaningful in _each and every single point in time_.
> All that considered, we should be aware that to properly drive
> schedutil and (in the future) the energy aware scheduler decisions we
> perhaps need better instead a "predictor".
> In the simple case of the periodic task, a good predictor should be
> something which reports always the same answer _in each point in
So the problem with this is that not many tasks are that periodic, and
any filter you put on top will add, lets call it, momentum to the
signal. A reluctance to change. This might negatively affect
In any case, worth trying, see what happens.
> For example, a task running 30 [ms] every 100 [ms] is a ~300 util_avg
> task. With PELT, we get a signal which range between [120,550] with an
> average of ~300 which is instead completely ignored. By capping the
> decay we will get:
> decay_cap [ms] range average
> 0 120:550 300
> 64 140:560 310
> 32 320:660 430
> which means that still the raw PELT signal is wobbling and never
> provides a consistent response to drive decisions.
> Thus, a "predictor" should be something which sample information from
> PELT to provide a more consistent view, a sort of of low-pass filter
> on top of the "dynamic metric" which is PELT.
> Should not such a "predictor" help on solving some of the issues
> related to PELT slow ramp-up or fast ramp-down?
I think intel_pstate recently added a local PID filter, I asked at the
time if something like that should live in generic code, looks like
maybe it should.