Re: [PATCH 1/2] cpufreq: schedutil: Fix superfluous updates caused by need_freq_update
From: Rafael J. Wysocki
Date: Thu Apr 10 2025 - 15:16:39 EST
Hi,
Sorry for kind of a retroactive response.
On Tue, Apr 8, 2025 at 10:59 AM Stephan Gerhold
<stephan.gerhold@xxxxxxxxxx> wrote:
>
> Hi,
>
> On Wed, Dec 11, 2024 at 05:57:32PM -0800, Sultan Alsawaf wrote:
> > From: "Sultan Alsawaf (unemployed)" <sultan@xxxxxxxxxxxxxxx>
> >
> > A redundant frequency update is only truly needed when there is a policy
> > limits change with a driver that specifies CPUFREQ_NEED_UPDATE_LIMITS.
> >
> > In spite of that, drivers specifying CPUFREQ_NEED_UPDATE_LIMITS receive a
> > frequency update _all the time_, not just for a policy limits change,
> > because need_freq_update is never cleared.
> >
> > Furthermore, ignore_dl_rate_limit()'s usage of need_freq_update also leads
> > to a redundant frequency update, regardless of whether or not the driver
> > specifies CPUFREQ_NEED_UPDATE_LIMITS, when the next chosen frequency is the
> > same as the current one.
> >
> > Fix the superfluous updates by only honoring CPUFREQ_NEED_UPDATE_LIMITS
> > when there's a policy limits change, and clearing need_freq_update when a
> > requisite redundant update occurs.
> >
> > This is neatly achieved by moving up the CPUFREQ_NEED_UPDATE_LIMITS test
> > and instead setting need_freq_update to false in sugov_update_next_freq().
> >
> > Signed-off-by: Sultan Alsawaf (unemployed) <sultan@xxxxxxxxxxxxxxx>
> > ---
> > kernel/sched/cpufreq_schedutil.c | 4 ++--
> > 1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
> > index 28c77904ea74..e51d5ce730be 100644
> > --- a/kernel/sched/cpufreq_schedutil.c
> > +++ b/kernel/sched/cpufreq_schedutil.c
> > @@ -83,7 +83,7 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
> >
> > if (unlikely(sg_policy->limits_changed)) {
> > sg_policy->limits_changed = false;
> > - sg_policy->need_freq_update = true;
> > + sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > return true;
> > }
> >
> > @@ -96,7 +96,7 @@ static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
> > unsigned int next_freq)
> > {
> > if (sg_policy->need_freq_update)
> > - sg_policy->need_freq_update = cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS);
> > + sg_policy->need_freq_update = false;
> > else if (sg_policy->next_freq == next_freq)
> > return false;
> >
>
> This patch breaks cpufreq throttling (e.g. for thermal cooling) for
> cpufreq drivers that:
>
> - Have policy->fast_switch_enabled/fast_switch_possible set, but
> - Do not have CPUFREQ_NEED_UPDATE_LIMITS flag set
>
> There are several examples for this in the tree (search for
> "fast_switch_possible"). Of all those drivers, only intel-pstate and
> amd-pstate (sometimes) set CPUFREQ_NEED_UPDATE_LIMITS.
>
> I can reliably reproduce this with scmi-cpufreq on a Qualcomm X1E
> laptop:
>
> 1. I added some low temperature trip points in the device tree,
> together with passive cpufreq cooling.
> 2. I run a CPU stress test on all CPUs and monitor the temperatures
> and CPU frequencies.
>
> When using "performance" governor instead of "schedutil", the CPU
> frequencies are being throttled as expected, as soon as the temperature
> trip points are reached.
>
> When using "schedutil", the CPU frequencies stay at maximum as long as
> the stress test is running. No throttling happens, so the device heats
> up far beyond the defined temperature trip points. Throttling is applied
> only after stopping the stress test, since this forces schedutil to
> re-evaluate the CPU frequency.
>
> Reverting this commit fixes the problem.
>
> Looking at the code, I think the problem is that:
> - sg_policy->limits_changed does not result in
> sg->policy->need_freq_update without CPUFREQ_NEED_UPDATE_LIMITS
> anymore, and
> - Without sg->policy->need_freq_update, get_next_freq() skips calling
> cpufreq_driver_resolve_freq(), which would normally apply the policy
> min/max constraints.
>
> Do we need to set CPUFREQ_NEED_UPDATE_LIMITS for all cpufreq drivers
> that set policy->fast_switch_possible?
I think that it is generally needed for ->fast_switch() because the
new limits don't take effect without running ->fast_switch().
But if that is the case, the behavior can just be made conditional on
policy->fast_switch_enabled.
> If I'm reading the documentation
> comment correctly, that flag is just supposed to enable notifications if
> the policy min/max changes, but the resolved target frequency is still
> the same.
If the policy min/max change and the resolved target frequency is
beyond the new limits. it needs to be changed.
The flag effectively says "Call my ->fast_switch() if the policy
limits have changed regardless of whether or not there is another
reason to call it."
> This is not the case here, the target frequency needs to be
> throttled, but schedutil isn't applying the new limits.
>
> Any suggestions how to fix this? I'm happy to test patches with my setup.
Replace cpufreq_driver_test_flags(CPUFREQ_NEED_UPDATE_LIMITS) in
sugov_should_update_freq() with sg_policy->policy->fast_switch_enabled
and (if this works), CPUFREQ_NEED_UPDATE_LIMITS can be deleted.