Re: [RFC][PATCH v021 4/9] sched/topology: Adjust cpufreq checks for EAS

From: Christian Loehle
Date: Wed Dec 11 2024 - 06:44:09 EST


On 12/11/24 11:29, Rafael J. Wysocki wrote:
> On Wed, Dec 11, 2024 at 11:33 AM Christian Loehle
> <christian.loehle@xxxxxxx> wrote:
>>
>> On 11/29/24 16:00, Rafael J. Wysocki wrote:
>>> From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>>
>>> Make it possible to use EAS with cpufreq drivers that implement the
>>> :setpolicy() callback instead of using generic cpufreq governors.
>>>
>>> This is going to be necessary for using EAS with intel_pstate in its
>>> default configuration.
>>>
>>> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
>>> ---
>>>
>>> This is the minimum of what's needed, but I'd really prefer to move
>>> the cpufreq vs EAS checks into cpufreq because messing around cpufreq
>>> internals in topology.c feels like a butcher shop kind of exercise.
>>
>> Makes sense, something like cpufreq_eas_capable().
>>
>>>
>>> Besides, as I said before, I remain unconvinced about the usefulness
>>> of these checks at all. Yes, one is supposed to get the best results
>>> from EAS when running schedutil, but what if they just want to try
>>> something else with EAS? What if they can get better results with
>>> that other thing, surprisingly enough?
>>
>> How do you imagine this to work then?
>> I assume we don't make any 'resulting-OPP-guesses' like
>> sugov_effective_cpu_perf() for any of the setpolicy governors.
>> Neither for dbs and I guess userspace.
>> What about standard powersave and performance?
>> Do we just have a cpufreq callback to ask which OPP to use for
>> the energy calculation? Assume lowest/highest?
>> (I don't think there is hardware where lowest/highest makes a
>> difference, so maybe not bothering with the complexity could
>> be an option, too.)
>
> In the "setpolicy" case there is no way to reliably predict the OPP
> that is going to be used, so why bother?
>
> In the other cases, and if the OPPs are actually known, EAS may still
> make assumptions regarding which of them will be used that will match
> the schedutil selection rules, but if the cpufreq governor happens to
> choose a different OPP, this is not the end of the world.

"Not the end of the world" as in the model making incorrect assumptions.
With the significant power-performance overlaps we see in mobile systems
taking sugov's guess while using powersave/performance (the !setpolicy
case) at least will make worse decisions.
See here for reference, first slide.
https://lpc.events/event/16/contributions/1194/attachments/1114/2139/LPC2022_Energy_model_accuracy.pdf

What about the config space, are you fine with everything relying on
CONFIG_CPU_FREQ_GOV_SCHEDUTIL?