Re: [RFC][PATCH v021 4/9] sched/topology: Adjust cpufreq checks for EAS

From: Rafael J. Wysocki
Date: Wed Dec 11 2024 - 06:29:30 EST


On Wed, Dec 11, 2024 at 11:33 AM Christian Loehle
<christian.loehle@xxxxxxx> wrote:
>
> On 11/29/24 16:00, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > Make it possible to use EAS with cpufreq drivers that implement the
> > :setpolicy() callback instead of using generic cpufreq governors.
> >
> > This is going to be necessary for using EAS with intel_pstate in its
> > default configuration.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> >
> > This is the minimum of what's needed, but I'd really prefer to move
> > the cpufreq vs EAS checks into cpufreq because messing around cpufreq
> > internals in topology.c feels like a butcher shop kind of exercise.
>
> Makes sense, something like cpufreq_eas_capable().
>
> >
> > Besides, as I said before, I remain unconvinced about the usefulness
> > of these checks at all. Yes, one is supposed to get the best results
> > from EAS when running schedutil, but what if they just want to try
> > something else with EAS? What if they can get better results with
> > that other thing, surprisingly enough?
>
> How do you imagine this to work then?
> I assume we don't make any 'resulting-OPP-guesses' like
> sugov_effective_cpu_perf() for any of the setpolicy governors.
> Neither for dbs and I guess userspace.
> What about standard powersave and performance?
> Do we just have a cpufreq callback to ask which OPP to use for
> the energy calculation? Assume lowest/highest?
> (I don't think there is hardware where lowest/highest makes a
> difference, so maybe not bothering with the complexity could
> be an option, too.)

In the "setpolicy" case there is no way to reliably predict the OPP
that is going to be used, so why bother?

In the other cases, and if the OPPs are actually known, EAS may still
make assumptions regarding which of them will be used that will match
the schedutil selection rules, but if the cpufreq governor happens to
choose a different OPP, this is not the end of the world.

Yes, you could have been more energy-efficient had you chosen to use
schedutil, but you chose otherwise and that's what you get.

> >
> > ---
> > kernel/sched/topology.c | 10 +++++++---
> > 1 file changed, 7 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/kernel/sched/topology.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/topology.c
> > +++ linux-pm/kernel/sched/topology.c
> > @@ -217,6 +217,7 @@ static bool sched_is_eas_possible(const
> > bool any_asym_capacity = false;
> > struct cpufreq_policy *policy;
> > struct cpufreq_governor *gov;
> > + bool cpufreq_ok;
> > int i;
> >
> > /* EAS is enabled for asymmetric CPU capacity topologies. */
> > @@ -251,7 +252,7 @@ static bool sched_is_eas_possible(const
> > return false;
> > }
> >
> > - /* Do not attempt EAS if schedutil is not being used. */
> > + /* Do not attempt EAS if cpufreq is not configured adequately */
> > for_each_cpu(i, cpu_mask) {
> > policy = cpufreq_cpu_get(i);
> > if (!policy) {
> > @@ -261,11 +262,14 @@ static bool sched_is_eas_possible(const
> > }
> > return false;
> > }
> > + /* Require schedutil or a "setpolicy" driver */
> > gov = policy->governor;
> > + cpufreq_ok = gov == &schedutil_gov ||
> > + (!gov && policy->policy != CPUFREQ_POLICY_UNKNOWN);
> > cpufreq_cpu_put(policy);
> > - if (gov != &schedutil_gov) {
> > + if (!cpufreq_ok) {
> > if (sched_debug()) {
> > - pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> > + pr_info("rd %*pbl: Checking EAS, unsuitable cpufreq governor\n",
> > cpumask_pr_args(cpu_mask));
> > }
> > return false;
>
> The logic here looks fine to me FWIW.
>
>