Re: [PATCH] sched/schedutil: Don't set next_freq to UINT_MAX

From: Rafael J. Wysocki
Date: Wed May 09 2018 - 04:56:25 EST


On Wed, May 9, 2018 at 10:41 AM, Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
> On 08-05-18, 22:54, Rafael J. Wysocki wrote:
>> On Tue, May 8, 2018 at 8:42 AM, Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>> > The schedutil driver sets sg_policy->next_freq to UINT_MAX on certain
>> > occasions:
>> > - In sugov_start(), when the schedutil governor is started for a group
>> > of CPUs.
>> > - And whenever we need to force a freq update before rate-limit
>> > duration, which happens when:
>> > - there is an update in cpufreq policy limits.
>> > - Or when the utilization of DL scheduling class increases.
>> >
>> > In return, get_next_freq() doesn't return a cached next_freq value but
>> > instead recalculates the next frequency. This has some side effects
>> > though and may significantly delay a required increase in frequency.
>> >
>> > In sugov_update_single() we try to avoid decreasing frequency if the CPU
>> > has not been idle recently. Consider this scenario, the available range
>> > of frequencies for a CPU are from 800 MHz to 2.5 GHz and current
>> > frequency is 800 MHz. From one of the call paths
>> > sg_policy->need_freq_update is set to true and hence
>> > sg_policy->next_freq is set to UINT_MAX. Now if the CPU had been busy,
>> > next_f will always be less than UINT_MAX, whatever the value of next_f
>> > is. And so even when we wanted to increase the frequency, we will
>> > overwrite next_f with UINT_MAX and will not change the frequency
>> > eventually. This will continue until the time CPU stays busy. This isn't
>> > cross checked with any specific test cases, but rather based on general
>> > code review.
>> >
>> > Fix that by not resetting the sg_policy->need_freq_update flag from
>> > sugov_should_update_freq() but get_next_freq() and we wouldn't need to
>> > overwrite sg_policy->next_freq anymore.
>> >
>> > Cc: 4.12+ <stable@xxxxxxxxxxxxxxx> # 4.12+
>> > Fixes: b7eaf1aab9f8 ("cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely")
>>
>> The rest of the chantelog is totally disconnected from this commit.
>
> I added the "Fixes" tag because this is exactly the commit after which
> this problem started, isn't it?
>
>> So the problem is that sugov_update_single() doesn't check the special
>> UNIT_MAX value before assigning sg_policy->next_freq to next_f. Fair
>> enough.
>>
>> I don't see why the patch is the right fix for that, however.
>
> I thought not overwriting next_freq makes things much simpler and easy
> to review.

I'm kind of concerned about updating the limits via sysfs in which
case the cached next frequency may be out of range, so it's better to
invalidate it right away then.

> What else do you have in mind to solve this problem ?

Something like the below?

---
kernel/sched/cpufreq_schedutil.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-pm/kernel/sched/cpufreq_schedutil.c
===================================================================
--- linux-pm.orig/kernel/sched/cpufreq_schedutil.c
+++ linux-pm/kernel/sched/cpufreq_schedutil.c
@@ -305,7 +305,8 @@ static void sugov_update_single(struct u
* Do not reduce the frequency if the CPU has not been idle
* recently, as the reduction is likely to be premature then.
*/
- if (busy && next_f < sg_policy->next_freq) {
+ if (busy && next_f < sg_policy->next_freq &&
+ sg_policy->next_freq != UINT_MAX) {
next_f = sg_policy->next_freq;

/* Reset cached freq as next_freq has changed */