Re: [RFC][PATCH v021 4/9] sched/topology: Adjust cpufreq checks for EAS

From: Rafael J. Wysocki
Date: Wed Dec 11 2024 - 11:38:19 EST


On Wed, Dec 11, 2024 at 2:25 PM Vincent Guittot
<vincent.guittot@xxxxxxxxxx> wrote:
>
> On Wed, 11 Dec 2024 at 12:29, Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote:
> >
> > On Wed, Dec 11, 2024 at 11:33 AM Christian Loehle
> > <christian.loehle@xxxxxxx> wrote:
> > >
> > > On 11/29/24 16:00, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > >
> > > > Make it possible to use EAS with cpufreq drivers that implement the
> > > > :setpolicy() callback instead of using generic cpufreq governors.
> > > >
> > > > This is going to be necessary for using EAS with intel_pstate in its
> > > > default configuration.
> > > >
> > > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > > ---
> > > >
> > > > This is the minimum of what's needed, but I'd really prefer to move
> > > > the cpufreq vs EAS checks into cpufreq because messing around cpufreq
> > > > internals in topology.c feels like a butcher shop kind of exercise.
> > >
> > > Makes sense, something like cpufreq_eas_capable().
> > >
> > > >
> > > > Besides, as I said before, I remain unconvinced about the usefulness
> > > > of these checks at all. Yes, one is supposed to get the best results
> > > > from EAS when running schedutil, but what if they just want to try
> > > > something else with EAS? What if they can get better results with
> > > > that other thing, surprisingly enough?
> > >
> > > How do you imagine this to work then?
> > > I assume we don't make any 'resulting-OPP-guesses' like
> > > sugov_effective_cpu_perf() for any of the setpolicy governors.
> > > Neither for dbs and I guess userspace.
> > > What about standard powersave and performance?
> > > Do we just have a cpufreq callback to ask which OPP to use for
> > > the energy calculation? Assume lowest/highest?
> > > (I don't think there is hardware where lowest/highest makes a
> > > difference, so maybe not bothering with the complexity could
> > > be an option, too.)
> >
> > In the "setpolicy" case there is no way to reliably predict the OPP
> > that is going to be used, so why bother?
> >
> > In the other cases, and if the OPPs are actually known, EAS may still
> > make assumptions regarding which of them will be used that will match
> > the schedutil selection rules, but if the cpufreq governor happens to
> > choose a different OPP, this is not the end of the world.
>
> Should we add a new cpufreq governor fops to return the guest estimate
> of the compute capacity selection ? something like
> cpufreq_effective_cpu_perf(cpu, actual, min, max)
> EAS needs to estimate what would be the next OPP; schedutil uses
> sugov_effective_cpu_perf() and other governor could provide their own

Generally, yes. And documented for that matter.

But it doesn't really tell you the OPP, but the performance level that
is going to be set for the given list of arguments IIUC. An energy
model is needed to find an OPP for the given perf level. Or generally
the cost of it for that matter.

> > Yes, you could have been more energy-efficient had you chosen to use
> > schedutil, but you chose otherwise and that's what you get.
>
> Calling sugov_effective_cpu_perf() for another governor than schedutil
> doesn't make sense.

It will work for intel_pstate in the "setpolicy" mode to a reasonable
approximation AFAICS.

> and do we handle the case when
> CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not selected

I don't think it's necessary to handle it.