Re: [PATCH 1/2] sched/schedutil: rework performance estimation

From: Dietmar Eggemann
Date: Thu Oct 26 2023 - 05:08:06 EST


On 20/10/2023 15:58, Vincent Guittot wrote:
> On Fri, 20 Oct 2023 at 11:48, Dietmar Eggemann <dietmar.eggemann@xxxxxxx> wrote:
>>
>> On 13/10/2023 17:14, Vincent Guittot wrote:

[...]

>>> A new sugov_effective_cpu_perf() interface is also available to compute
>>> the final performance level that is targeted for the CPU after applying
>>> some cpufreq headroom and taking into account all inputs.
>>>
>>> With these 2 functions, schedutil is now able to decide when it must go
>>> above uclamp hints. It now also have a generic way to get the min
>>> perfromance level.
>>>
>>> The dependency between energy model and cpufreq governor and its headroom
>>> policy doesn't exist anymore.
>>
>> But the dependency that both are doing the same thing still exists, right?
>
> For the energy model itself, it is now fully removed; only EAS still
> has to estimate which perf level will be selected by schedutil but it
> uses now a schedutil function without having to care about headroom
> and cpufreq governor policy

I see now. (1) replaces (2) so only schedutil and EAS, EM dependency is
gone.

compute_energy()

max_util = eenv_pd_max_util()

sugov_effective_cpu_perf()

actual = map_util_perf(actual) (1)


energy = em_cpu_energy(..., max_util, ...);

max_util = map_util_perf(max_util) (2)

[...]

>>> unsigned long effective_cpu_util(int cpu, unsigned long util_cfs,
>>> - enum cpu_util_type type,
>>> - struct task_struct *p)
>>> + unsigned long *min,
>>> + unsigned long *max)
>>
>> FREQUENCY_UTIL relates to *min != NULL and *max != NULL
>>
>> ENERGY_UTIL relates to *min == NULL and *max == NULL
>>
>> so both must be either NULL or !NULL.
>>
>> Calling it with one equa NULL and the other with !NULL should be
>> undefined, right?
>
> At now there is no user but one could consider only asking for min or
> max. So I would not say undefined but unused

OK.

[...]

>>> - * OTOH, for energy computation we need the estimated running time, so
>>> - * include util_dl and ignore dl_bw.
>>> - */
>>> - if (type == ENERGY_UTIL)
>>> - util += dl_util;
>>> + if (util >= scale) {
>>> + if (max)
>>> + *max = scale;
>>
>> But that means that ucamp_max cannot constrain a system in which the
>> 'util > ucamp_max'. I guess that's related to you saying uclamp_min is a
>> hard req and uclamp_max is a soft req. I don't think that's in sync with
>> the rest of the uclamp_max implantation.
>
> That's a mistake, I made a shortcut here. I wanted to save the
> scale_irq_capacity() step but forgot to update max 1st.
>
> Will fix it

I see.

[...]

>> effective_cpu_util for FREQUENCY_UTIL (i.e. (*min != NULL && *max !=
>> NULL)) is slightly different.
>>
>> missing:
>>
>> if (!uclamp_is_used() && rt_rq_is_runnable(&rq->rt)
>> return max
>>
>> probably moved into sugov_effective_cpu_perf() (which is only called
>> for `FREQUENCY_UTIL`) ?
>
> yes, it's in sugov_effective_cpu_perf()

OK.

[...]

>>> @@ -306,7 +329,7 @@ static inline bool sugov_cpu_is_busy(struct sugov_cpu *sg_cpu) { return false; }
>>> */
>>> static inline void ignore_dl_rate_limit(struct sugov_cpu *sg_cpu)
>>> {
>>> - if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_dl)
>>> + if (cpu_bw_dl(cpu_rq(sg_cpu->cpu)) > sg_cpu->bw_min)
>>
>> bw_min is more than DL right?
>
> yes
>
> Interruptions are preempting DL so we should include them
> And now that we can take into account uclamp_min, use it when
> computing the min perf parameter of cpufreq_driver_adjust_perf()

OK.