Re: [PATCH 1/2] cpufreq/schedutil: Add fast_switch callback check

From: Wyes Karny
Date: Wed Apr 12 2023 - 02:09:20 EST


Hi Rafael,

On 11 Apr 17:55, Rafael J. Wysocki wrote:
> On Mon, Apr 10, 2023 at 11:53 AM Wyes Karny <wyes.karny@xxxxxxx> wrote:
> >
> > The set value of `fast_switch_enabled` flag doesn't guarantee that
> > fast_switch callback is set. For some drivers such as amd_pstate, the
> > adjust_perf callback is used but it still sets `fast_switch_possible`
> > flag. This is not wrong because this flag doesn't imply fast_switch
> > callback is set, it implies whether the driver can guarantee that
> > frequency can be changed on any CPU sharing the policy and that the
> > change will affect all of the policy CPUs without the need to send any
> > IPIs or issue callbacks from the notifier chain. Therefore add an extra
> > NULL check before calling fast_switch in sugov_update_single_freq
> > function.
> >
> > Ideally `sugov_update_single_freq` function should not be called with
> > amd_pstate. But in a corner case scenario, when aperf/mperf overflow
> > occurs, kernel disables frequency invariance calculation which causes
> > schedutil to fallback to sugov_update_single_freq which currently relies
> > on the fast_switch callback.
>
> Yes, it does. Which is why that callback must be provided if the
> driver sets fast_switch_enabled.
>
> Overall, adjust_perf is optional, but fast_switch_enabled can only be
> set if fast_switch is actually present.

Then should the below logic be changed in sugov_start function?

'''
else if (policy->fast_switch_enabled && cpufreq_driver_has_adjust_perf())
uu = sugov_update_single_perf;
'''

This logic restricts the selection of adjust_perf function based on
fast_switch_enabled flag. If this fast_switch_enabled check is removed
then amd_pstate driver can disable this flag and shedutil can select
adjust_perf without this dependency.

Thanks,
Wyes
>
> Please fix the driver.
>
> >
> > Normal flow:
> > sugov_update_single_perf
> > cpufreq_driver_adjust_perf
> > cpufreq_driver->adjust_perf
> >
> > Error case flow:
> > sugov_update_single_perf
> > sugov_update_single_freq <-- This is chosen because the freq invariant is disabled due to aperf/mperf overflow
> > cpufreq_driver_fast_switch
> > cpufreq_driver->fast_switch <-- Here NULL pointer dereference is happening, because fast_switch is not set
> >
> > Fix this NULL pointer dereference issue by doing a NULL check.
> >
> > Fixes: a61dec744745 ("cpufreq: schedutil: Avoid missing updates for one-CPU policies")
> > Signed-off-by: Wyes Karny <wyes.karny@xxxxxxx>
> >
> > Cc: "Rafael J. Wysocki" <rafael@xxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx
> > ---
> > drivers/cpufreq/cpufreq.c | 11 +++++++++++
> > include/linux/cpufreq.h | 1 +
> > kernel/sched/cpufreq_schedutil.c | 2 +-
> > 3 files changed, 13 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
> > index 6d8fd3b8dcb5..364d31b55380 100644
> > --- a/drivers/cpufreq/cpufreq.c
> > +++ b/drivers/cpufreq/cpufreq.c
> > @@ -2138,6 +2138,17 @@ unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> > }
> > EXPORT_SYMBOL_GPL(cpufreq_driver_fast_switch);
> >
> > +/**
> > + * cpufreq_driver_has_fast_switch - Check "fast switch" callback.
> > + *
> > + * Return 'true' if the ->fast_switch callback is present for the
> > + * current driver or 'false' otherwise.
> > + */
> > +bool cpufreq_driver_has_fast_switch(void)
> > +{
> > + return !!cpufreq_driver->fast_switch;
> > +}
> > +
> > /**
> > * cpufreq_driver_adjust_perf - Adjust CPU performance level in one go.
> > * @cpu: Target CPU.
> > diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
> > index 65623233ab2f..8a9286fc718b 100644
> > --- a/include/linux/cpufreq.h
> > +++ b/include/linux/cpufreq.h
> > @@ -604,6 +604,7 @@ struct cpufreq_governor {
> > /* Pass a target to the cpufreq driver */
> > unsigned int cpufreq_driver_fast_switch(struct cpufreq_policy *policy,
> > unsigned int target_freq);
> > +bool cpufreq_driver_has_fast_switch(void);
> > void cpufreq_driver_adjust_perf(unsigned int cpu,
> > unsigned long min_perf,
> > unsigned long target_perf,
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index e3211455b203..a1c449525ac2 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -364,7 +364,7 @@ static void sugov_update_single_freq(struct update_util_data *hook, u64 time,
> > * concurrently on two different CPUs for the same target and it is not
> > * necessary to acquire the lock in the fast switch case.
> > */
> > - if (sg_policy->policy->fast_switch_enabled) {
> > + if (sg_policy->policy->fast_switch_enabled && cpufreq_driver_has_fast_switch()) {
> > cpufreq_driver_fast_switch(sg_policy->policy, next_f);
> > } else {
> > raw_spin_lock(&sg_policy->update_lock);
> > --
> > 2.34.1
> >