Re: [RFC PATCH 06/16] sched/schedutil: Add a new tunable to dictate response time
From: Christian Loehle
Date: Tue Sep 17 2024 - 06:22:24 EST
On 9/16/24 23:22, Dietmar Eggemann wrote:
> On 20/08/2024 18:35, Qais Yousef wrote:
>> The new tunable, response_time_ms, allow us to speed up or slow down
>> the response time of the policy to meet the perf, power and thermal
>> characteristic desired by the user/sysadmin. There's no single universal
>> trade-off that we can apply for all systems even if they use the same
>> SoC. The form factor of the system, the dominant use case, and in case
>> of battery powered systems, the size of the battery and presence or
>> absence of active cooling can play a big role on what would be best to
>> use.
>>
>> The new tunable provides sensible defaults, but yet gives the power to
>> control the response time to the user/sysadmin, if they wish to.
>>
>> This tunable is applied before we apply the DVFS headroom.
>>
>> The default behavior of applying 1.25 headroom can be re-instated easily
>> now. But we continue to keep the min required headroom to overcome
>> hardware limitation in its speed to change DVFS. And any additional
>> headroom to speed things up must be applied by userspace to match their
>> expectation for best perf/watt as it dictates a type of policy that will
>> be better for some systems, but worse for others.
>>
>> There's a whitespace clean up included in sugov_start().
>>
>> Signed-off-by: Qais Yousef <qyousef@xxxxxxxxxxx>
>> ---
>> Documentation/admin-guide/pm/cpufreq.rst | 17 +++-
>> drivers/cpufreq/cpufreq.c | 4 +-
>> include/linux/cpufreq.h | 3 +
>> kernel/sched/cpufreq_schedutil.c | 115 ++++++++++++++++++++++-
>> 4 files changed, 132 insertions(+), 7 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/pm/cpufreq.rst b/Documentation/admin-guide/pm/cpufreq.rst
>> index 6adb7988e0eb..fa0d602a920e 100644
>> --- a/Documentation/admin-guide/pm/cpufreq.rst
>> +++ b/Documentation/admin-guide/pm/cpufreq.rst
>> @@ -417,7 +417,7 @@ is passed by the scheduler to the governor callback which causes the frequency
>> to go up to the allowed maximum immediately and then draw back to the value
>> returned by the above formula over time.
>>
>> -This governor exposes only one tunable:
>> +This governor exposes two tunables:
>>
>> ``rate_limit_us``
>> Minimum time (in microseconds) that has to pass between two consecutive
>> @@ -427,6 +427,21 @@ This governor exposes only one tunable:
>> The purpose of this tunable is to reduce the scheduler context overhead
>> of the governor which might be excessive without it.
>>
>> +``respone_time_ms``
s/respone/response
>> + Amount of time (in milliseconds) required to ramp the policy from
>> + lowest to highest frequency. Can be decreased to speed up the
> ^^^^^^^^^^^^^^^^^
>
> This has changed IMHO. Should be the time from lowest (or better 0) to
> second highest frequency.
>
> https://lkml.kernel.org/r/20230827233203.1315953-6-qyousef@xxxxxxxxxxx
>
> [...]
>
Isn't it even more complicated than that?
We have the headroom applied on top of the response_time_ms, so
response_time_ms will be longer than the time it takes to reach highest cap OPP.
Furthermore, applying this to a big CPU e.g. with OPP0 cap of 200, starting
from 0 is (usually?) irrelevant, as we likely wouldn't be here if we were at 0.
I get the intent, but conveying this in an understandable interface is hard.