Re: [PATCH v6 1/4] cpufreq: Remove per-CPU QoS constraint

From: Pierre Gondois

Date: Tue Mar 24 2026 - 07:45:22 EST


Hello Viresh,

On 3/20/26 10:18, Viresh Kumar wrote:
On 19-03-26, 10:30, Pierre Gondois wrote:
On 3/18/26 12:13, Viresh Kumar wrote:
On 17-03-26, 11:17, Pierre Gondois wrote:
policy->max_freq_req represents the maximum allowed frequency as
requested by the policyX/scaling_max_freq sysfs file. This request
applies to all CPUs of the policy. It is not possible to request
a per-CPU maximum frequency.

Thus, the interaction between the policy boost and scaling_max_freq
settings should be handled by adding a boost specific QoS constraint.
This will be handled in the following patches.
I don't think the above is required anymore. This patch is removing stale code
now which isn't useful anymore. It has nothing to do with a boost specific QOS
constraint.
Yes ok
And it would be better to know for sure why this isn't required anymore and
which patch exactly fixed this issue.

On a kernel based on 1608f0230510~, and replicating the
process described in the commit message of

commit 1608f0230510 ("cpufreq: Fix re-boost issue after hotplugging
a CPU")

I could not see any issue regarding the values of:

- policy1/cpuinfo_max_freq
- policy1/scaling_max_freq
The commit message (of 1608f0230510) is confusing. The issue was discussed
properly in the following thread.

https://lore.kernel.org/all/20250120082723.am7rxujmdvzz4eky@vireshk-i7/

The problem is that policy->max and policy->cpuinfo_max_freq are incorrect after
the sequence mentioned in the commit, while max_freq_req is correct.

I experimented a bit more and it seems the following happens:

1. boost all CPUs: echo 1 > /sys/devices/system/cpu/cpufreq/boost
2. offline one CPU: echo 0 > /sys/devices/system/cpu/cpuX/online
3. deboost all CPUs: echo 0 > /sys/devices/system/cpu/cpufreq/boost

cpufreq_boost_trigger_state()
\-for_each_active_policy()
  \-cpufreq_driver->set_boost()
doesn't act on the policy where there are no more online CPUs,
so the max/cpuinfo.max/max_freq_req is left to the actual
boost freq.

4. online CPUX: echo 1 > /sys/devices/system/cpu/cpuX/online

cpufreq_online()
\-cpufreq_driver->init()
  \-cppc_cpufreq_cpu_init()
There:
- policy->max
- policy->cpuinfo.max_freq
are set to the maximal non-boost freq., which is the correct value.

However, max_freq_req is left to the boosted frequency, so this
is effectively an incorrect state.
Also in cpufreq_set_policy(), policy->max is set to:
  min(max_freq_req, cpuinfo.max_freq)
(cf. verify() cb), so the incorrect state of max_freq_req is not
visible.

5. boost all CPUs again: echo 1 > /sys/devices/system/cpu/cpufreq/boost

As the max_freq_req value and the new boost value are equal,
cpufreq_notifier_max() won't be called, which means that if the
CPU needed to raise its freq., it won't be notified until another
event trigger a re-evaluation of the freq. selection.

To observe that, I had to:
- use the performance governor to be sure to select the max. available
  freq.
- observe scaling_cur_freq to see the last requested freq. for the CPU

So IMO the issue was actually fixed by:
dd016f379ebc ("cpufreq: Introduce a more generic way to set default
per-policy boost flag")
which sets the correct max_freq_req value when putting back an
inactive policy.



I think another commit has fixed that (incorrectly and unintentionally):
commit 6db0f533d320 ("cpufreq: preserve freq_table_sorted across suspend/hibernate")

@@ -1421,9 +1421,12 @@ static int cpufreq_policy_online(struct cpufreq_policy *policy,
* If there is a problem with its frequency table, take it
* offline and drop it.
*/
- ret = cpufreq_table_validate_and_sort(policy);
- if (ret)
- goto out_offline_policy;
+ if (policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_ASCENDING &&
+ policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_DESCENDING) {
+ ret = cpufreq_table_validate_and_sort(policy);
+ if (ret)
+ goto out_offline_policy;
+ }

This skipped calling cpufreq_table_validate_and_sort() completely on online and
so max/cpuinfo_max_freq, max_freq_req are all in sync.

That change should be fixed with:

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 277884d91913..1f794524a1d9 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1427,12 +1427,9 @@ static int cpufreq_policy_online(struct cpufreq_policy *policy,
* If there is a problem with its frequency table, take it
* offline and drop it.
*/
- if (policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_ASCENDING &&
- policy->freq_table_sorted != CPUFREQ_TABLE_SORTED_DESCENDING) {
- ret = cpufreq_table_validate_and_sort(policy);
- if (ret)
- goto out_offline_policy;
- }
+ ret = cpufreq_table_validate_and_sort(policy);
+ if (ret)
+ goto out_offline_policy;

/* related_cpus should at least include policy->cpus. */
cpumask_copy(policy->related_cpus, policy->cpus);
diff --git a/drivers/cpufreq/freq_table.c b/drivers/cpufreq/freq_table.c
index 7f251daf03ce..5b364d8da4f9 100644
--- a/drivers/cpufreq/freq_table.c
+++ b/drivers/cpufreq/freq_table.c
@@ -360,6 +360,10 @@ int cpufreq_table_validate_and_sort(struct cpufreq_policy *policy)
if (policy_has_boost_freq(policy))
policy->boost_supported = true;

+ if (policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_ASCENDING ||
+ policy->freq_table_sorted == CPUFREQ_TABLE_SORTED_DESCENDING)
+ return 0;
+
return set_freq_table_sorted(policy);
}