Re: [PATCH 1/1] cpufreq: Set policy->min and max as real QoS constraints

From: Zhongqiu Han

Date: Wed May 06 2026 - 08:51:57 EST


On 4/30/2026 9:41 PM, Pierre Gondois wrote:
Hello Zhongqiu,

On 4/29/26 15:00, Zhongqiu Han wrote:
On 4/23/2026 4:47 PM, Pierre Gondois wrote:
cpufreq_set_policy() will ultimately override the policy min/max
values written in the .init() callback through:
cpufreq_policy_online()
\-cpufreq_init_policy()
   \-cpufreq_set_policy()
     \-/* Set policy->min/max */
Thus the policy min/max values provided are only temporary.

There is an exception if CPUFREQ_NEED_INITIAL_FREQ_CHECK is set and:
cpufreq_policy_online()
\-cpufreq_init_policy()
   \-__cpufreq_driver_target()
     \-cpufreq_driver->target()
is called. To avoid any regression, set policy->min/max in cpufreq.c
if the values were not initialized.

In this patch:
- Setting policy->min or max value in driver .init() cb is
   interpreted as setting a QoS constraint.
- Remove policy->min/max initialization in drivers if the values
   are similar to policy->cpuinfo.min_freq/max_freq.
   The only drivers where these values are different are:
   - gx-suspmod.c
   - cppc-cpufreq.c
   - longrun.c
- For the cppc-cpufreq driver, the lowest non-linear freq. is
   used as a min QoS constraint as suggested at:
https://lore.kernel.org/lkml/20260213100633.15413-1- zhangpengjie2@xxxxxxxxxx/

Signed-off-by: Pierre Gondois <pierre.gondois@xxxxxxx>


Hi Pierre,
Thanks for the patch. I have a few additional inline comments/questions
below.


---
  drivers/cpufreq/amd-pstate.c      | 16 ++++++++--------
  drivers/cpufreq/cppc_cpufreq.c    | 11 +++++++----
  drivers/cpufreq/cpufreq-nforce2.c |  4 ++--
  drivers/cpufreq/cpufreq.c         | 19 +++++++++++++++++--
  drivers/cpufreq/freq_table.c      |  7 +++----
  drivers/cpufreq/gx-suspmod.c      |  9 +++++----
  drivers/cpufreq/intel_pstate.c    |  3 ---
  drivers/cpufreq/pcc-cpufreq.c     |  8 ++++----
  drivers/cpufreq/pxa3xx-cpufreq.c  |  4 ++--
  drivers/cpufreq/sh-cpufreq.c      |  4 ++--
  drivers/cpufreq/virtual-cpufreq.c |  5 +----
  11 files changed, 51 insertions(+), 39 deletions(-)

diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
index 453084c67327f..1ed4bcdcc957f 100644
--- a/drivers/cpufreq/amd-pstate.c
+++ b/drivers/cpufreq/amd-pstate.c
@@ -1090,10 +1090,10 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy)
        perf = READ_ONCE(cpudata->perf);
  -    policy->cpuinfo.min_freq = policy->min = perf_to_freq(perf,
-                                  cpudata->nominal_freq,
-                                  perf.lowest_perf);
-    policy->cpuinfo.max_freq = policy->max = cpudata->max_freq;
+    policy->cpuinfo.min_freq = perf_to_freq(perf,
+                        cpudata->nominal_freq,
+                        perf.lowest_perf);
+    policy->cpuinfo.max_freq = cpudata->max_freq;


It is better to update doc as well to avoid new dirver developmenter set
policy->min / policy->max again?

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/ tree/Documentation/cpu-freq/cpu-drivers.rst#n102

Yes sure, does the following works for you:
(tabs might not be conserved in the mail)

diff --git a/Documentation/cpu-freq/cpu-drivers.rst b/Documentation/cpu- freq/cpu-drivers.rst
index c5635ac3de547..12dc4dcbdd5a6 100644
--- a/Documentation/cpu-freq/cpu-drivers.rst
+++ b/Documentation/cpu-freq/cpu-drivers.rst
@@ -114,8 +114,14 @@ Then, the driver must fill in the following values:
 |policy->cur                       | The current operating frequency of   |
 |                                  | this CPU (if appropriate)         |
 +----------------------------------- +--------------------------------------+
-|policy->min,                      |             |
-|policy->max,                      |             |
+|policy->min                       | Minimum frequency QoS constraint.   |
+|                                  | Can be overwritten by writing to    |
+|                                  | scaling_min sysfs file.         |
++----------------------------------- +--------------------------------------+
+|policy->max                       | Maximum frequency QoS constraint.   |
+|                                  | Can be overwritten by writing to    |
+|                                  | scaling_max sysfs file.         |
++----------------------------------- +--------------------------------------+
 |policy->policy and, if necessary,  |            |
 |policy->governor                  | must contain the "default policy" for|
 |                                  | this CPU. A few moments later,    |



Thanks Pierre,

Just shared a minor nit and used 'If' to indicate that it's optional.
For example, if you think it makes sense, you may consider:

+|policy->min | If set by the driver in ->init(), used as the |
+| | initial minimum frequency QoS request. |
++-------------------------------------------------------------------+
+|policy->max | If set by the driver in ->init(), used as the |
+| | initial maximum frequency QoS request. |



        policy->driver_data = cpudata;
      ret = amd_pstate_cppc_enable(policy);
@@ -1907,10 +1907,10 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy)
        perf = READ_ONCE(cpudata->perf);
  -    policy->cpuinfo.min_freq = policy->min = perf_to_freq(perf,
-                                  cpudata->nominal_freq,
-                                  perf.lowest_perf);
-    policy->cpuinfo.max_freq = policy->max = cpudata->max_freq;
+    policy->cpuinfo.min_freq = perf_to_freq(perf,
+                        cpudata->nominal_freq,
+                        perf.lowest_perf);
+    policy->cpuinfo.max_freq = cpudata->max_freq;
      policy->driver_data = cpudata;
        ret = amd_pstate_cppc_enable(policy);
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/ cppc_cpufreq.c
index 7e7f9dfb7a24c..c6fcecdbbab0c 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -645,6 +645,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
      unsigned int cpu = policy->cpu;
      struct cppc_cpudata *cpu_data;
      struct cppc_perf_caps *caps;
+    unsigned int min, max;
      int ret;
        cpu_data = cppc_cpufreq_get_cpu_data(cpu);
@@ -655,13 +656,15 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
      caps = &cpu_data->perf_caps;
      policy->driver_data = cpu_data;
  +    min = cppc_perf_to_khz(caps, caps->lowest_nonlinear_perf);
+    max = cppc_perf_to_khz(caps, policy->boost_enabled ?
+                   caps->highest_perf : caps->nominal_perf);
+
      /*
       * Set min to lowest nonlinear perf to avoid any efficiency penalty (see
       * Section 8.4.7.1.1.5 of ACPI 6.1 spec)
       */
-    policy->min = cppc_perf_to_khz(caps, caps->lowest_nonlinear_perf);
-    policy->max = cppc_perf_to_khz(caps, policy->boost_enabled ?
-                        caps->highest_perf : caps->nominal_perf);
+    policy->min = min;
        /*
       * Set cpuinfo.min_freq to Lowest to make the full range of performance
@@ -669,7 +672,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
       * nonlinear perf
       */
      policy->cpuinfo.min_freq = cppc_perf_to_khz(caps, caps- >lowest_perf);
-    policy->cpuinfo.max_freq = policy->max;
+    policy->cpuinfo.max_freq = max;
        policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu);
      policy->shared_type = cpu_data->shared_type;
diff --git a/drivers/cpufreq/cpufreq-nforce2.c b/drivers/cpufreq/ cpufreq-nforce2.c
index fbbbe501cf2dc..831102522ad64 100644
--- a/drivers/cpufreq/cpufreq-nforce2.c
+++ b/drivers/cpufreq/cpufreq-nforce2.c
@@ -355,8 +355,8 @@ static int nforce2_cpu_init(struct cpufreq_policy *policy)
          min_fsb = NFORCE2_MIN_FSB;
        /* cpuinfo and default policy values */
-    policy->min = policy->cpuinfo.min_freq = min_fsb * fid * 100;
-    policy->max = policy->cpuinfo.max_freq = max_fsb * fid * 100;
+    policy->cpuinfo.min_freq = min_fsb * fid * 100;
+    policy->cpuinfo.max_freq = max_fsb * fid * 100;
        return 0;
  }
diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 44eb1b7e7fc1b..b30bfa3e27daa 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1453,6 +1453,14 @@ static int cpufreq_policy_online(struct cpufreq_policy *policy,
      cpumask_and(policy->cpus, policy->cpus, cpu_online_mask);
        if (new_policy) {
+        unsigned int min, max;
+
+        /* Use policy->min/max set by the driver as QoS requests. */
+        min = max(FREQ_QOS_MIN_DEFAULT_VALUE, policy->min);
+        if (policy->max)
+            max = min(FREQ_QOS_MAX_DEFAULT_VALUE, policy->max);
+        else
+            max = FREQ_QOS_MAX_DEFAULT_VALUE;


Nit: Using local variables named min/max is confusing here since they
shadow the common min()/max() macros; renaming them (e.g. min_freq
/ max_freq) would improve readability and maintainability.

Ok yes sure

          for_each_cpu(j, policy->related_cpus) {
              per_cpu(cpufreq_cpu_data, j) = policy;
              add_cpu_dev_symlink(policy, j, get_cpu_device(j));
@@ -1469,18 +1477,25 @@ static int cpufreq_policy_online(struct cpufreq_policy *policy,
            ret = freq_qos_add_request(&policy->constraints,
                         &policy->min_freq_req, FREQ_QOS_MIN,
-                       FREQ_QOS_MIN_DEFAULT_VALUE);
+                       min);

It seems that the current patch is not merely a superficial cleanup; it
also changes the policy->min value in the GX driver, setting it to the
5% value expected by the driver. If so, we should document it in the
commit message.

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/
tree/drivers/cpufreq/gx-suspmod.c#n137

/* gx-suspmod.c constants:
 *   POLICY_MIN_DIV = 20
 *   max_duration   = 255
 *   maxfreq = e.g. 300000 kHz (300 MHz)
 */

cpufreq_policy_online()
  cpufreq_driver->init(policy)      /* cpufreq_gx_cpu_init() */
    policy->min = maxfreq/20        /* 15000 kHz, 5% */
    cpuinfo.min_freq = maxfreq/255  /* 1176 kHz, 0.39% */

  /* Before current patch: 0, not policy->min */
  freq_qos_add_request(..., FREQ_QOS_MIN, 0)

  cpufreq_init_policy()
    cpufreq_set_policy()
      /* reads QoS=0, discards init()'s 15000 */
      new_data.min = freq_qos_read_value(FREQ_QOS_MIN)
      cpufreq_gx_verify()
        cpufreq_verify_within_cpu_limits()
          /* 0 < 1176: clamp to hw floor */
          new_data.min = cpuinfo.min_freq  /* 1176 kHz */
      WRITE_ONCE(policy->min, 1176)  /* 0.39%, not 5% */

After current patch:
freq_qos_add_request(..., FREQ_QOS_MIN, policy->min)
  => new_data.min stays 15000, no clamping, policy->min = 15000

Prior to [1], the policy->min/max values were used as QoS constraint.
This patch effectively set a min QoS constraint, but it should be no
different from what the driver was setting initially.

I will add a reference to the patch + explanation in the commit
message to avoid any ambiguities.

[1] 521223d8b3ec ("cpufreq: Fix initialization of min and max frequency QoS requests")



Thanks, that makes sense to me.


          if (ret < 0)
              goto out_destroy_policy;
            ret = freq_qos_add_request(&policy->constraints,
                         &policy->max_freq_req, FREQ_QOS_MAX,
-                       FREQ_QOS_MAX_DEFAULT_VALUE);
+                       max);
          if (ret < 0)
              goto out_destroy_policy;
blocking_notifier_call_chain(&cpufreq_policy_notifier_list,
                  CPUFREQ_CREATE_POLICY, policy);
+
+        /*
+         * If the driver didn't set QoS constraints, policy->min/max still
+         * need to be set as they are used to clamp frequency requests.
+         */
+        policy->min = policy->min ? policy->min : policy- >cpuinfo.min_freq;
+        policy->max = policy->max ? policy->max : policy- >cpuinfo.max_freq;


Does it make sense to set policy->min / policy->max before the
CPUFREQ_CREATE_POLICY notifier, since some drivers may use them in their
callbacks?

https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/ tree/Documentation/cpu-freq/core.rst#n58


Yes right indeed, this would be better.

Thanks for the review



--
Thx and BRs,
Zhongqiu Han