Re: [RFC][PATCH v0.1 5/6] sched/topology: Allow .setpolicy() cpufreq drivers to enable EAS

From: Rafael J. Wysocki
Date: Mon Nov 11 2024 - 08:55:10 EST


On Mon, Nov 11, 2024 at 12:54 PM Christian Loehle
<christian.loehle@xxxxxxx> wrote:
>
> On 11/8/24 16:41, Rafael J. Wysocki wrote:
> > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> >
> > Some cpufreq drivers, like intel_pstate, have built-in governors that
> > are used instead of regular cpufreq governors, schedutil in particular,
> > but they can work with EAS just fine, so allow EAS to be used with
> > those drivers.
> >
> > Also update the debug message printed when the cpufreq governor in
> > use is not schedutil and the related comment, to better match the
> > code after the change.
> >
> > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > ---
> >
> > I'm not sure how much value there is in refusing to enable EAS without
> > schedutil in general. For instance, if there are no crossover points
> > between the cost curves for different perf domains, EAS may as well be
> > used with the performance and powersave governors AFAICS.
>
> Agreed, but having no cross-over points or no DVFS at all should be the
> only instances, right?

Not really. This is the most obvious case, but there are other less
obvious ones.

Say there are two cross-over points: The "performance" and
"powersave" governors should still be fine with EAS in that case.

Or what if somebody has a governor in user space that generally
behaves like schedutil?

Or what about ondemand? Is it alway completely broken with EAS?

> For plain (non-intel_pstate) powersave and performance we could replace
> sugov_effective_cpu_perf()
> that determines the OPP of the perf-domain by the OPP they will be
> choosing, but for the rest?

I generally think that depending on schedutil for EAS is a mistake.

I would just print a warning that results may be suboptimal or
generally not as expected if the cpufreq governor is not schedutil
instead of preventing EAS from running at all.

> Also there is the entire uclamp thing, not sure what the best
> solution is there.
> Will intel_pstate just always ignore it? Might be better then to
> depend on !intel_pstate?

Well, it can be made dependent on policy->policy ==
CPUFREQ_POLICY_POWERSAVE if gov is NULL or similar, but honestly why
bother?

> > ---
> > kernel/sched/topology.c | 6 +++---
> > 1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > Index: linux-pm/kernel/sched/topology.c
> > ===================================================================
> > --- linux-pm.orig/kernel/sched/topology.c
> > +++ linux-pm/kernel/sched/topology.c
> > @@ -251,7 +251,7 @@ static bool sched_is_eas_possible(const
> > return false;
> > }
> >
> > - /* Do not attempt EAS if schedutil is not being used. */
> > + /* Do not attempt EAS with a cpufreq governor other than schedutil. */
> > for_each_cpu(i, cpu_mask) {
> > policy = cpufreq_cpu_get(i);
> > if (!policy) {
> > @@ -263,9 +263,9 @@ static bool sched_is_eas_possible(const
> > }
> > gov = policy->governor;
> > cpufreq_cpu_put(policy);
> > - if (gov != &schedutil_gov) {
> > + if (gov && gov != &schedutil_gov) {
> > if (sched_debug()) {
> > - pr_info("rd %*pbl: Checking EAS, schedutil is mandatory\n",
> > + pr_info("rd %*pbl: Checking EAS, cpufreq governor is not schedutil\n",
> > cpumask_pr_args(cpu_mask));
> > }
> > return false;
> >
> >
> >
> >
>
>