Re: [PATCH v2 3/3] PM / EM: Skip inefficient OPPs

From: Vincent Donnefort
Date: Tue May 25 2021 - 05:59:50 EST


On Tue, May 25, 2021 at 10:21:37AM +0100, Vincent Donnefort wrote:
> On Tue, May 25, 2021 at 10:48:23AM +0200, Peter Zijlstra wrote:
> > On Fri, May 21, 2021 at 05:54:24PM +0100, Vincent Donnefort wrote:
> > > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > > index 4f09afd..5a91a2b 100644
> > > --- a/kernel/sched/cpufreq_schedutil.c
> > > +++ b/kernel/sched/cpufreq_schedutil.c
> > > @@ -10,6 +10,7 @@
> > >
> > > #include "sched.h"
> > >
> > > +#include <linux/energy_model.h>
> > > #include <linux/sched/cpufreq.h>
> > > #include <trace/events/power.h>
> > >
> > > @@ -153,6 +154,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> > >
> > > freq = map_util_freq(util, freq, max);
> > >
> > > + /* Avoid inefficient performance states */
> > > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq);
> > > +
> > > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > > return sg_policy->next_freq;
> > >
> >
> > This seems somewhat unfortunate, it adds a loop over the OPPs only to
> > then call into cpufreq to do the exact same thing again :/
>
> Indeed, but it would be complicated to avoid the double loop:
>
> It is possible to register OPPs (and by extension perf_states) for a
> frequency for which, the cpufreq table entry is marked with
> CPUFREQ_ENTRY_INVALID. It would probably be an issue that would have to be
> fixed in the driver, but it is currently allowed.
>
> More importantly, while resolving the frequency, we also cache the index in
> cached_resolved_idx. Some drivers, such as qcom-cpufreq-hw rely on this
> value for their fastswitch support.

Unless we are ok bringing the cpufreq idx into the Energy Model. But I
originally dismissed this idea. I didn't want to bring a such dependency between
the two frameworks, especially as the EM can also work with Devfreq. But maybe
it's worth it in the end. Any thoughts?

>
> Notice though, we would iterate over the EM only in the case where the
> performance state has found inefficiencies.
>