Re: [PATCH v1 3/4] cpufreq: Add special-purpose fast-switching callback for drivers

From: Rafael J. Wysocki
Date: Tue Dec 15 2020 - 10:39:14 EST


On Tue, Dec 15, 2020 at 5:17 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>
> On 08-12-20, 14:32, Viresh Kumar wrote:
> > On 07-12-20, 17:35, Rafael J. Wysocki wrote:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > >
> > > First off, some cpufreq drivers (eg. intel_pstate) can pass hints
> > > beyond the current target frequency to the hardware and there are no
> > > provisions for doing that in the cpufreq framework. In particular,
> > > today the driver has to assume that it should not allow the frequency
> > > to fall below the one requested by the governor (or the required
> > > capacity may not be provided) which may not be the case and which may
> > > lead to excessive energy usage in some scenarios.
> > >
> > > Second, the hints passed by these drivers to the hardware need not be
> > > in terms of the frequency, so representing the utilization numbers
> > > coming from the scheduler as frequency before passing them to those
> > > drivers is not really useful.
> > >
> > > Address the two points above by adding a special-purpose replacement
> > > for the ->fast_switch callback, called ->adjust_perf, allowing the
> > > governor to pass abstract performance level (rather than frequency)
> > > values for the minimum (required) and target (desired) performance
> > > along with the CPU capacity to compare them to.
> > >
> > > Also update the schedutil governor to use the new callback instead
> > > of ->fast_switch if present.
> > >
> > > Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > ---
> > >
> > > Changes with respect to the RFC:
> > > - Don't pass "busy" to ->adjust_perf().
> > > - Use a special 'update_util' hook for the ->adjust_perf() case in
> > > schedutil (this still requires an additional branch because of the
> > > shared common code between this case and the "frequency" one, but
> > > IMV this version is cleaner nevertheless).
> > >
> > > ---
> > > drivers/cpufreq/cpufreq.c | 40 ++++++++++++++++++++++++++++++++
> > > include/linux/cpufreq.h | 14 +++++++++++
> > > include/linux/sched/cpufreq.h | 5 ++++
> > > kernel/sched/cpufreq_schedutil.c | 48 +++++++++++++++++++++++++++++++--------
> > > 4 files changed, 98 insertions(+), 9 deletions(-)
> > >
> > > Index: linux-pm/include/linux/cpufreq.h
> > > ===================================================================
> > > --- linux-pm.orig/include/linux/cpufreq.h
> > > +++ linux-pm/include/linux/cpufreq.h
> > > @@ -320,6 +320,15 @@ struct cpufreq_driver {
> > > unsigned int index);
> > > unsigned int (*fast_switch)(struct cpufreq_policy *policy,
> > > unsigned int target_freq);
> > > + /*
> > > + * ->fast_switch() replacement for drivers that use an internal
> > > + * representation of performance levels and can pass hints other than
> > > + * the target performance level to the hardware.
> > > + */
> > > + void (*adjust_perf)(unsigned int cpu,
> > > + unsigned long min_perf,
> > > + unsigned long target_perf,
> > > + unsigned long capacity);
> >
> > With this callback in place, do we still need to keep the other stuff we
> > introduced recently, like CPUFREQ_NEED_UPDATE_LIMITS ?
>
> Ping

Missed this one, sorry.

We still need those things for the other governors.