RE: [PATCH] cpufreq: intel_pstate: Fix cpuinfo_cur_freq after performance governor changes

From: Huaisheng HS1 Ye
Date: Mon Jul 24 2017 - 11:33:12 EST


Hi Rafael,
Thanks for your reply.

> On Monday, July 24, 2017 05:43:14 AM Huaisheng HS1 Ye wrote:
> > After commit 82b4e03e01bc (intel_pstate: skip scheduler hook when in
> > "performance" mode) Software P-state control modes couldn't get
> > dynamic value during performance mode,
>
> Please explain what you mean here.
>
commit 82b4e03e01bc (intel_pstate: skip scheduler hook when in
"performance" mode) disables intel_pstate_set_update_util_hook when current
policy is performance within function intel_pstate_set_policy. It leads to
Software P-states couldn't update sysfs interface cpuinfo_cur_freq's value
during performance mode, because of pstate_funcs.update_util couldn't set
for the given CPU.

> I guess you carried out some tests and the results were not as expected, so
> what was the test?
Exactly, we check the sysfs interface cpuinfo_cur_freq and the output of
cpupower frequency-info both with performance mode.

For example, we list logic CPU 63's sysfs interface and output of cpupower with performance
mode below,
analyzing CPU 63:
driver: intel_pstate
CPUs which run at the same hardware frequency: 63
CPUs which need to have their frequency coordinated by software: 63
maximum transition latency: Cannot determine or is not supported.
hardware limits: 1000 MHz - 3.70 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 1000 MHz and 3.70 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: 1.07 GHz (asserted by call to hardware) <--------cpuinfo_cur_freq
boost state support:
Supported: yes
Active: yes

[root@localhost cpufreq]# pwd
/sys/devices/system/cpu/cpu63/cpufreq
[root@localhost cpufreq]# cat cpuinfo_cur_freq scaling_cur_freq
1070379
2796179

The value of cpuinfo_cur_freq is static always during performance mode, because of
"cpu->sample.core_avg_perf" doesn't have chance to be updated when driver leaves
from powersave mode. Its value equals to last sample result which comes from function
intel_pstate_calc_avg_perf.
commit f8475cef (x86: use common aperfmperf_khz_on_cpu() to calculate KHz using
APERF/MPERF) offers a great method to help scaling_cur_freq gets accurate frequency.
And for some tools like cpupower which would try to get cpuinfo_cur_freq at first to show
current frequency, if it fails then it would move to scaling_cur_freq which represents the
real value.

>
> > and it still in last value from powersave mode, so clear its value to
> > get same behavior as Hardware P-state to avoid confusion.
>
> And please explain why it should be fixed the way you've done that.
>

In order to make sure that cpuinfo_cur_freq couldn't get an effective value,
we clean up "sample.core_avg_perf" when cpu's policy set to performance mode.
so the reading of sysfs cpuinfo_cur_freq would get "<unknown>" always within
performance mode, which is same with the status when HWP has been enabled.

With this patch, logic CPU 63's sysfs interface and output of cpupower with performance
mode would be like this,

analyzing CPU 63:
driver: intel_pstate
CPUs which run at the same hardware frequency: 63
CPUs which need to have their frequency coordinated by software: 63
maximum transition latency: Cannot determine or is not supported.
hardware limits: 1000 MHz - 3.70 GHz
available cpufreq governors: performance powersave
current policy: frequency should be within 1000 MHz and 3.70 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency: Unable to call hardware <--------cpuinfo_cur_freq
current CPU frequency: 2.80 GHz (asserted by call to kernel) <--------scaling_cur_freq
boost state support:
Supported: yes
Active: yes

[root@localhost cpufreq]# pwd
/sys/devices/system/cpu/cpu63/cpufreq
[root@localhost cpufreq]# cat cpuinfo_cur_freq scaling_cur_freq
<unknown>
2800000

> > Signed-off-by: Huaisheng Ye <yehs1@xxxxxxxxxx>
> > ---
> > drivers/cpufreq/intel_pstate.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/cpufreq/intel_pstate.c
> > b/drivers/cpufreq/intel_pstate.c index 6cd5035..c675626 100644
> > --- a/drivers/cpufreq/intel_pstate.c
> > +++ b/drivers/cpufreq/intel_pstate.c
> > @@ -2050,6 +2050,7 @@ static int intel_pstate_set_policy(struct
> cpufreq_policy *policy)
> > */
> > intel_pstate_clear_update_util_hook(policy->cpu);
> > intel_pstate_max_within_limits(cpu);
> > + cpu->sample.core_avg_perf = 0;
> > } else {
> > intel_pstate_set_update_util_hook(policy->cpu);
> > }
> >