Re: [PATCH 0/2] cpufreq: governor: Apply limits with target_freq instead of policy->cur

From: zhenglifeng (A)

Date: Mon Mar 09 2026 - 22:26:44 EST


On 3/9/2026 8:32 PM, Rafael J. Wysocki wrote:
> On Mon, Mar 9, 2026 at 9:27 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>>
>> On 10-02-26, 19:54, Lifeng Zheng wrote:
>>> The motivation for this patchset cames from a test on our platform:
>>>
>>> With conservative governor and some pressure on CPU, the frequency rapidly
>>> reach the max supported frequency, such as 2GHz.
>>>
>>> Later, some frequency division strategies on our platform were triggered
>>> and the actual frequency become 500MHz -- 1/4 of the OS distribution
>>> frequency.
>>>
>>> At that time, if someone excecutes 'cat cpuinfo_cur_freq', the actual
>>> frequency will become 250MHz -- 1/4 of the min supported frequency.
>>>
>>> After the platform recovering from the frequency division, the frequency
>>> will stay on 1GHz, until the pressure disappear.
>>>
>>> The reason this happens is that in cpufreq_verify_current_freq(), if
>>> policy->cur != new_freq, policy->update will be queued, which will
>>> ultimately lead to a call to cpufreq_policy_apply_limits(), and update the
>>> target frequency to policy->min. And then in cs_dbs_update(), since the
>>> pressure never vanish, it will always hit the following branches:
>>>
>>> if (load > dbs_data->up_threshold) {
>>> dbs_info->down_skip = 0;
>>>
>>> /* if we are already at full speed then break out early */
>>> if (requested_freq == policy->max)
>>> goto out;
>>>
>>> Therefore, the target frequency will always remain at the lowest frequency.
>>>
>>> The branching conditions in cs_dbs_update() may not be strict enough, but
>>> the root cause of this problem is that the target frequency was updated
>>> when querying cpuinfo_cur_freq. For ondemand and schedutil governor,
>>> although the frequency will not always remain at the lowest level without
>>> rising, will still be min_freq in a short period of time when the query
>>> action occurs.
>>>
>>> Using the freq requested by the governor to decide whether to update the
>>> target frequency is more reasonable in cpufreq_policy_apply_limits().
>>
>> I think I understand the problem now. We are tracking the current
>> frequency state via two cached values, policy->cur and requested_freq
>> and a mismatch (because of your hardware specific quirks/features)
>> between them is making things tricky.
>>
>> Rafael, will this break anything we can think about ?
>
> I can't recall, but the new code is simpler, so unless anyone has a
> particular heartburn with it, go for it.

Hi Viresh,

Sorry, I don't think the new code can totally solve the problem I met.

In the example I showed above, policy->cur will become 1GHz when someone
query cpuinfo_cur_freq, because the real freq is 500MHz. After that,
conservative governor has to take some time to get to the max freq again
step by step in your code. This means the real freq will shake whenever
cpuinfo_cur_freq is queried. This situation also occurs with ondemand and
schedutil governor, but they don't need to increase the frequency step by
step.

I still believe using policy->cur to update target in
cpufreq_policy_apply_limits() is not very reasonable, because OS never
knows what the platform will do to affect the final frequency. OS should
determine the new target frequency based on the previous decision.

>
>> diff --git a/drivers/cpufreq/cpufreq_conservative.c b/drivers/cpufreq/cpufreq_conservative.c
>> index e0e847764511..c69577e4f941 100644
>> --- a/drivers/cpufreq/cpufreq_conservative.c
>> +++ b/drivers/cpufreq/cpufreq_conservative.c
>> @@ -14,7 +14,6 @@
>> struct cs_policy_dbs_info {
>> struct policy_dbs_info policy_dbs;
>> unsigned int down_skip;
>> - unsigned int requested_freq;
>> };
>>
>> static inline struct cs_policy_dbs_info *to_dbs_info(struct policy_dbs_info *policy_dbs)
>> @@ -59,10 +58,10 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>> {
>> struct policy_dbs_info *policy_dbs = policy->governor_data;
>> struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy_dbs);
>> - unsigned int requested_freq = dbs_info->requested_freq;
>> struct dbs_data *dbs_data = policy_dbs->dbs_data;
>> struct cs_dbs_tuners *cs_tuners = dbs_data->tuners;
>> unsigned int load = dbs_update(policy);
>> + unsigned int requested_freq = policy->cur;
>> unsigned int freq_step;
>>
>> /*
>> @@ -72,16 +71,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>> if (cs_tuners->freq_step == 0)
>> goto out;
>>
>> - /*
>> - * If requested_freq is out of range, it is likely that the limits
>> - * changed in the meantime, so fall back to current frequency in that
>> - * case.
>> - */
>> - if (requested_freq > policy->max || requested_freq < policy->min) {
>> - requested_freq = policy->cur;
>> - dbs_info->requested_freq = requested_freq;
>> - }
>> -
>> freq_step = get_freq_step(cs_tuners, policy);
>>
>> /*
>> @@ -113,7 +102,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>>
>> __cpufreq_driver_target(policy, requested_freq,
>> CPUFREQ_RELATION_HE);
>> - dbs_info->requested_freq = requested_freq;
>> goto out;
>> }
>>
>> @@ -137,7 +125,6 @@ static unsigned int cs_dbs_update(struct cpufreq_policy *policy)
>>
>> __cpufreq_driver_target(policy, requested_freq,
>> CPUFREQ_RELATION_LE);
>> - dbs_info->requested_freq = requested_freq;
>> }
>>
>> out:
>> @@ -310,7 +297,6 @@ static void cs_start(struct cpufreq_policy *policy)
>> struct cs_policy_dbs_info *dbs_info = to_dbs_info(policy->governor_data);
>>
>> dbs_info->down_skip = 0;
>> - dbs_info->requested_freq = policy->cur;
>> }
>>
>> static struct dbs_governor cs_governor = {
>>
>> -------------------------8<-------------------------
>>
>> This always pick the next freq based on policy->cur instead of the
>> real last request. The two can differ if:
>> - the hardware plays with current frequency, as is the case here.
>> - or the limits change and that changes the current frequency (in
>> which case we will be at policy->min/max anyway).
>>
>> --
>> viresh
>