Re: [PATCH 1/3] cpufreq: Add a callback to update the min_freq_req from drivers

From: Dhananjay Ugwekar
Date: Mon Oct 07 2024 - 00:40:35 EST


Hello Rafael,

On 10/4/2024 11:47 PM, Rafael J. Wysocki wrote:
> On Thu, Oct 3, 2024 at 10:44 AM Dhananjay Ugwekar
> <Dhananjay.Ugwekar@xxxxxxx> wrote:
>>
>> Currently, there is no proper way to update the initial lower frequency
>> limit from cpufreq drivers.
>
> Why do you want to do it?

We want to set the initial lower frequency limit at a more efficient level
(lowest_nonlinear_freq) than the lowest frequency, which helps save power in
some idle scenarios, and also improves benchmark results in some scenarios.
At the same time, we want to allow the user to set the lower limit back to
the inefficient lowest frequency.

Thanks,
Dhananjay

>
>> Only way is to add a new min_freq qos
>> request from the driver side, but it leads to the issue explained below.
>>
>> The QoS infrastructure collates the constraints from multiple
>> subsystems and saves them in a plist. The "current value" is defined to
>> be the highest value in the plist for min_freq constraint.
>>
>> The cpufreq core adds a qos_request for min_freq to be 0 and the amd-pstate
>> driver today adds qos request for min_freq to be lowest_freq, where
>> lowest_freq corresponds to CPPC.lowest_perf.
>>
>> Eg: Suppose WLOG considering amd-pstate driver, lowest_freq is 400000 KHz,
>> lowest_non_linear_freq is 1200000 KHz.
>>
>> At this point of time, the min_freq QoS plist looks like:
>>
>> head--> 400000 KHz (registered by amd-pstate) --> 0 KHz (registered by
>> cpufreq core)
>>
>> When a user updates /sys/devices/system/cpu/cpuX/cpufreq/scaling_min_freq,
>> it only results in updating the cpufreq-core's node in the plist, where
>> say 0 becomes the newly echoed value.
>>
>> Now, if the user echoes a value 1000000 KHz, to scaling_min_freq, then the
>> new list would be
>>
>> head--> 1000000 KHz (registered by cpufreq core) --> 400000 KHz (registered
>> by amd-pstate)
>>
>> and the new "current value" of the min_freq QoS constraint will be 1000000
>> KHz, this is the scenario where it works as expected.
>>
>> Suppose we change the amd-pstate driver code's min_freq qos constraint
>> to lowest_non_linear_freq instead of lowest_freq, then the user will
>> never be able to request a value below that, due to the following:
>>
>> At boot time, the min_freq QoS plist would be
>>
>> head--> 1200000 KHz (registered by amd-pstate) --> 0 KHz (registered by
>> cpufreq core)
>>
>> When the user echoes a value of 1000000 KHz, to
>> /sys/devices/..../scaling_min_freq, then the new list would be
>>
>> head--> 1200000 KHz (registered by amd-pstate) --> 1000000 KHz (registered
>> by cpufreq core)
>>
>> with the new "current value" of the min_freq QoS remaining 1200000 KHz.
>
> Yes, that's how frequency QoS works.
>
>> Since the current value has not changed, there won't be any notifications
>> sent to the subsystems which have added their QoS constraints. In
>> particular, the amd-pstate driver will not get the notification, and thus,
>> the user's request to lower the scaling_min_freq will be ineffective.
>
> The value written by user space to scaling_min_freq is a vote, not a
> request. It may not be physically possible to reduce the frequency
> below a certain minimum level that need not be known to the user.
>
>> Hence, it is advisable to have a single source of truth for the min and
>> max freq QoS constraints between the cpufreq and the cpufreq drivers.
>>
>> So add a new callback get_init_min_freq() add in struct cpufreq_driver,
>> which allows amd-pstate (or any other cpufreq driver) to override the
>> default min_freq value being set in the policy->min_freq_req. Now
>> scaling_min_freq can be modified by the user to any value (lower or
>> higher than the init value) later on if desired.
>>
>> Signed-off-by: Dhananjay Ugwekar <Dhananjay.Ugwekar@xxxxxxx>
>> ---
>> drivers/cpufreq/cpufreq.c | 6 +++++-
>> include/linux/cpufreq.h | 6 ++++++
>> 2 files changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
>> index f98c9438760c..2923068cf5f4 100644
>> --- a/drivers/cpufreq/cpufreq.c
>> +++ b/drivers/cpufreq/cpufreq.c
>> @@ -1361,6 +1361,7 @@ static int cpufreq_online(unsigned int cpu)
>> bool new_policy;
>> unsigned long flags;
>> unsigned int j;
>> + u32 init_min_freq = FREQ_QOS_MIN_DEFAULT_VALUE;
>> int ret;
>>
>> pr_debug("%s: bringing CPU%u online\n", __func__, cpu);
>> @@ -1445,9 +1446,12 @@ static int cpufreq_online(unsigned int cpu)
>> goto out_destroy_policy;
>> }
>>
>> + if (cpufreq_driver->get_init_min_freq)
>> + init_min_freq = cpufreq_driver->get_init_min_freq(policy);
>> +
>> ret = freq_qos_add_request(&policy->constraints,
>> policy->min_freq_req, FREQ_QOS_MIN,
>> - FREQ_QOS_MIN_DEFAULT_VALUE);
>> + init_min_freq);
>> if (ret < 0) {
>> /*
>> * So we don't call freq_qos_remove_request() for an
>> diff --git a/include/linux/cpufreq.h b/include/linux/cpufreq.h
>> index e0e19d9c1323..b20488b55f6c 100644
>> --- a/include/linux/cpufreq.h
>> +++ b/include/linux/cpufreq.h
>> @@ -414,6 +414,12 @@ struct cpufreq_driver {
>> * policy is properly initialized, but before the governor is started.
>> */
>> void (*register_em)(struct cpufreq_policy *policy);
>> +
>> + /*
>> + * Set by drivers that want to initialize the policy->min_freq_req with
>> + * a value different from the default value (0) in cpufreq core.
>> + */
>> + int (*get_init_min_freq)(struct cpufreq_policy *policy);
>> };
>>
>> /* flags */
>> --
>> 2.34.1
>>