Re: [PATCH v2 3/3] PM / EM: Skip inefficient OPPs

From: Vincent Donnefort
Date: Tue May 25 2021 - 05:20:46 EST


On Tue, May 25, 2021 at 10:48:23AM +0200, Peter Zijlstra wrote:
> On Fri, May 21, 2021 at 05:54:24PM +0100, Vincent Donnefort wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 4f09afd..5a91a2b 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -10,6 +10,7 @@
> >
> > #include "sched.h"
> >
> > +#include <linux/energy_model.h>
> > #include <linux/sched/cpufreq.h>
> > #include <trace/events/power.h>
> >
> > @@ -153,6 +154,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >
> > freq = map_util_freq(util, freq, max);
> >
> > + /* Avoid inefficient performance states */
> > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq);
> > +
> > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > return sg_policy->next_freq;
> >
>
> This seems somewhat unfortunate, it adds a loop over the OPPs only to
> then call into cpufreq to do the exact same thing again :/

Indeed, but it would be complicated to avoid the double loop:

It is possible to register OPPs (and by extension perf_states) for a
frequency for which, the cpufreq table entry is marked with
CPUFREQ_ENTRY_INVALID. It would probably be an issue that would have to be
fixed in the driver, but it is currently allowed.

More importantly, while resolving the frequency, we also cache the index in
cached_resolved_idx. Some drivers, such as qcom-cpufreq-hw rely on this
value for their fastswitch support.

Notice though, we would iterate over the EM only in the case where the
performance state has found inefficiencies.