Re: [RFC PATCH v4 0/6] sched/cpufreq: Make schedutil energy aware

From: Douglas Raillard
Date: Wed Mar 11 2020 - 08:41:02 EST




On 2/14/20 1:37 PM, Peter Zijlstra wrote:
> On Thu, Feb 13, 2020 at 05:49:48PM +0000, Douglas Raillard wrote:
>
>>> description of it all somewhere.
>>
>> Now a textual version of it:
>>
>> em_pd_get_higher_freq() does the following:
>>
>> # Turn the abstract cost margin on the EM_COST_MARGIN_SCALE into a
>> # concrete value. cost_margin=EM_COST_MARGIN_SCALE will give a concrete
>> # value of "max_cost", which is the highest OPP on that CPU.
>> concrete_margin = (cost_margin * max_cost) / EM_COST_MARGIN_SCALE;
>>
>> # Then it finds the lowest OPP satisfying min_freq:
>> min_opp = OPP_AT_FREQ(min_freq)
>>
>> # It takes the cost associated, and finds the highest OPP that has a
>> # cost lower than that:
>> max_cost = COST_OF(min_opp) + concrete_margin
>>
>> final_freq = MAX(
>> FREQ_OF(opp)
>> for opp in available_opps
>> if COST_OF(opp) <= max_cost
>> )
>
> Right; I got that.
>
>> So this means that:
>> util - util_est_enqueued ~= 0
>
> Only if you assume the task will get scheduled out reasonably frequent.
>
>> => cost_margin ~= 0
>> => concrete_cost_margin ~= 0
>> => max_cost = COST_OF(min_opp) + 0
>> => final_freq = FREQ_OF(min_opp)
>>
>> The effective boost is ~0, so you will get the current behaviour of
>> schedutil.
>
> But the argument holds; because if things don't get scheduled out, we'll
> peg u = 1 and hit f = 1 and all is well anyway.
>
> Which is a useful property; it shows that in the steady state, this
> patch-set is a NOP, but the above argument only relies on 'util_avg >
> util_est' being used a trigger.

Yes, `util_avg > util_est` can only happen when the task's duty cycle is
changing, which does not happen at steady state.

Either it's periodic and the boost is legitimate, or it's not periodic
and we assume it's a periodic task well represented by its last
activation and sleep (for the purpose of boosting).

Tasks with a high variability in their activation durations (i.e. not
periodic at all) will likely get more boosting on average, which is
probably good since we can't predict much about them, so in doubt we
tilt the behaviour of schedutil toward racing to completion.

>> If the task starts needing more cycles than during its previous period,
>> `util - util_est_enqueued` will grow like util since util_est_enqueued
>> is constant. The longer we wait, the higher the boost, until the task
>> goes to sleep again.
>>
>> At next wakeup, util_est_enqueued has caught up and either:
>> 1) util becomes stable, so no more boosting
>> 2) util keeps increasing, so go for another round of boosting
>
> Agreed; however elsewhere you wrote:
>
>> 1) If you care more about predictable battery life (or energy bill) than
>> predictability of the boost feature, EM should be used.
>>
>> 2) If you don't have an EM or you care more about having a predictable
>> boost for a given workload, use util (or disable that boost).
>
> This is the part I'm still not sure about; how do the specifics of the
> cost_margin setup lead to 1), or how would some frobbing with frequency
> selection destroy that property.

This should be answered by this other thread:
https://lore.kernel.org/lkml/5d732dc1-d343-24d2-bda9-072021a510ed@xxxxxxx/#t