Re: [PATCH v2 3/3] PM / EM: Skip inefficient OPPs

From: Viresh Kumar
Date: Fri May 28 2021 - 01:09:42 EST


On 25-05-21, 10:48, Peter Zijlstra wrote:
> On Fri, May 21, 2021 at 05:54:24PM +0100, Vincent Donnefort wrote:
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 4f09afd..5a91a2b 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -10,6 +10,7 @@
> >
> > #include "sched.h"
> >
> > +#include <linux/energy_model.h>
> > #include <linux/sched/cpufreq.h>
> > #include <trace/events/power.h>
> >
> > @@ -153,6 +154,9 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy,
> >
> > freq = map_util_freq(util, freq, max);
> >
> > + /* Avoid inefficient performance states */
> > + freq = em_pd_get_efficient_freq(em_cpu_get(policy->cpu), freq);
> > +
> > if (freq == sg_policy->cached_raw_freq && !sg_policy->need_freq_update)
> > return sg_policy->next_freq;
> >
>
> This seems somewhat unfortunate, it adds a loop over the OPPs only to
> then call into cpufreq to do the exact same thing again :/

And that's why I feel it needs to be done at a single place, either disable the
OPP (which seems like a bad option based on what Lukasz and Vincent said
earlier), or make changes in the cpufreq core itself to search for the best
frequency (like adding another API to mark some frequencies as inefficient, and
take that into account while selecting next freq).

There is a potential of ending up selecting the wrong frequency here because
there are too many decision making bodies here and so corner cases.

--
viresh