RE: [PATCH] cpufreq: schedutil: Don't skip freq update when limits change

From: Doug Smythies
Date: Thu Aug 01 2019 - 13:57:57 EST


On 2019.07.31 23:17 Viresh Kumar wrote:
> On 31-07-19, 17:20, Doug Smythies wrote:
>> Summary:
>>
>> The old way, using UINT_MAX had two purposes: first,
>> as a "need to do a frequency update" flag; but also second, to
>> force any subsequent old/new frequency comparison to NOT be "the same,
>> so why bother actually updating" (see: sugov_update_next_freq). All
>> patches so far have been dealing with the flag, but only partially
>> the comparisons. In a busy system, and when schedutil.c doesn't actually
>> know the currently set system limits, the new frequency is dominated by
>> values the same as the old frequency. So, when sugov_fast_switch calls
>> sugov_update_next_freq, false is usually returned.
>
> And finally we know "Why" :)
>
> Good work Doug. Thanks for taking it to the end.
>
>> However, if we move the resetting of the flag and add another condition
>> to the "no need to actually update" decision, then perhaps this patch
>> version 1 will be O.K. It seems to be. (see way later in this e-mail).
>
>> With all this new knowledge, how about going back to
>> version 1 of this patch, and then adding this:
>>
>> diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c
>> index 808d32b..f9156db 100644
>> --- a/kernel/sched/cpufreq_schedutil.c
>> +++ b/kernel/sched/cpufreq_schedutil.c
>> @@ -100,7 +100,12 @@ static bool sugov_should_update_freq(struct sugov_policy *sg_policy, u64 time)
>> static bool sugov_update_next_freq(struct sugov_policy *sg_policy, u64 time,
>> unsigned int next_freq)
>> {
>> - if (sg_policy->next_freq == next_freq)
>> + /*
>> + * Always force an update if the flag is set, regardless.
>> + * In some implementations (intel_cpufreq) the frequency is clamped
>> + * further downstream, and might not actually be different here.
>> + */
>> + if (sg_policy->next_freq == next_freq && !sg_policy->need_freq_update)
>> return false;
>
> This is not correct because this is an optimization we have in place
> to make things more efficient. And it was working by luck earlier and
> my patch broke it for good :)

Disagree.
All I did was use a flag where it used to be set to UNIT_MAX, to basically
implement the same thing.

> Things need to get a bit more synchronized and something like this may
> help (completely untested):
>
> diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c
> index cc27d4c59dca..2d84361fbebc 100644
> --- a/drivers/cpufreq/intel_pstate.c
> +++ b/drivers/cpufreq/intel_pstate.c
> @@ -2314,6 +2314,18 @@ static int intel_cpufreq_target(struct cpufreq_policy *policy,
> return 0;
> }
>
> +static unsigned int intel_cpufreq_resolve_freq(struct cpufreq_policy *policy,
> + unsigned int target_freq)
> +{
> + struct cpudata *cpu = all_cpu_data[policy->cpu];
> + int target_pstate;
> +
> + target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling);
> + target_pstate = intel_pstate_prepare_request(cpu, target_pstate);
> +
> + return target_pstate * cpu->pstate.scaling;
> +}
> +
> static unsigned int intel_cpufreq_fast_switch(struct cpufreq_policy *policy,
> unsigned int target_freq)
> {
> @@ -2350,6 +2362,7 @@ static struct cpufreq_driver intel_cpufreq = {
> .verify = intel_cpufreq_verify_policy,
> .target = intel_cpufreq_target,
> .fast_switch = intel_cpufreq_fast_switch,
> + .resolve_freq = intel_cpufreq_resolve_freq,
> .init = intel_cpufreq_cpu_init,
> .exit = intel_pstate_cpu_exit,
> .stop_cpu = intel_cpufreq_stop_cpu,
>
> -------------------------8<-------------------------
>
> Please try this with my patch 2.

O.K.

> We need patch 2 instead of 1 because
> of another race condition Rafael noticed.

Disagree.
Notice that my modifications to your patch1 addresses
that condition by moving the clearing of "need_freq_update"
to sometime later.

>
> cpufreq_schedutil calls driver specific resolve_freq() to find the new
> target frequency and this is where the limits should get applied IMO.

Oh! I didn't know. But yes, that makes sense.

>
> Rafael can help with reviewing this diff but it would be great if you
> can give this a try Doug.

Anyway, I added the above code (I am calling it patch3) to patch2, as
you asked, and it does work. I also added it to my modified patch1,
additionally removing the extra condition check that I added
(i.e. all that remains of my patch1 modifications is the moved
clearing of "need_freq_update") That kernel also worked for both
intel_cpufreq/schedutil and acpi-cpufreq/schedutil.

Again, I do not know how to test the original issue that led
to the change away from UINT_MAX in the first place,
ecd2884291261e3fddbc7651ee11a20d596bb514, which should be
tested in case of some introduced regression.

... Doug