Re: [Question]: about 'cpuinfo_cur_freq' shown in sysfs when the CPU is in idle state

From: Xiongfeng Wang
Date: Wed Jun 03 2020 - 21:32:55 EST


Hi Rafael,

Thanks for your reply !

On 2020/6/3 21:39, Rafael J. Wysocki wrote:
> On Wed, Jun 3, 2020 at 9:52 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>>
>> On 02-06-20, 11:34, Xiongfeng Wang wrote:
>>> Hi Viresh,
>>>
>>> Sorry to disturb you about another problem as follows.
>>>
>>> CPPC use the increment of Desired Performance counter and Reference Performance
>>> counter to get the CPU frequency and show it in sysfs through
>>> 'cpuinfo_cur_freq'. But ACPI CPPC doesn't specifically define the behavior of
>>> these two counters when the CPU is in idle state, such as stop incrementing when
>>> the CPU is in idle state.
>>>
>>> ARMv8.4 Extension inctroduced support for the Activity Monitors Unit (AMU). The
>>> processor frequency cycles and constant frequency cycles in AMU can be used as
>>> Delivered Performance counter and Reference Performance counter. These two
>>> counter in AMU does not increase when the PE is in WFI or WFE. So the increment
>>> is zero when the PE is in WFI/WFE. This cause no issue because
>>> 'cppc_get_rate_from_fbctrs()' in cppc_cpufreq driver will check the increment
>>> and return the desired performance if the increment is zero.
>>>
>>> But when the CPU goes into power down idle state, accessing these two counters
>>> in AMU by memory-mapped address will return zero. Such as CPU1 went into power
>>> down idle state and CPU0 try to get the frequency of CPU1. In this situation,
>>> will display a very big value for 'cpuinfo_cur_freq' in sysfs. Do you have some
>>> advice about this problem ?
>>>
>>> I was thinking about an idea as follows. We can run 'cppc_cpufreq_get_rate()' on
>>> the CPU to be measured, so that we can make sure the CPU is in C0 state when we
>>> access the two counters. Also we can return the actual frequency rather than
>>> desired performance when the CPU is in WFI/WFE. But this modification will
>>> change the existing logical and I am not sure if this will cause some bad effect.
>>>
>>>
>>> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
>>> index 257d726..ded3bcc 100644
>>> --- a/drivers/cpufreq/cppc_cpufreq.c
>>> +++ b/drivers/cpufreq/cppc_cpufreq.c
>>> @@ -396,9 +396,10 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
>>> return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
>>> }
>>>
>>> -static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
>>> +static int cppc_cpufreq_get_rate_cpu(void *info)
>>> {
>>> struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
>>> + unsigned int cpunum = *(unsigned int *)info;
>>> struct cppc_cpudata *cpu = all_cpu_data[cpunum];
>>> int ret;
>>>
>>> @@ -418,6 +419,22 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
>>> return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
>>> }
>>>
>>> +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
>>> +{
>>> + unsigned int ret;
>>> +
>>> + ret = smp_call_on_cpu(cpunum, cppc_cpufreq_get_rate_cpu, &cpunum, true);
>>> +
>>> + /*
>>> + * convert negative error code to zero, otherwise we will display
>>> + * an odd value for 'cpuinfo_cur_freq' in sysfs
>>> + */
>>> + if (ret < 0)
>>> + ret = 0;
>>> +
>>> + return ret;
>>> +}
>>> +
>>> static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
>>> {
>>> struct cppc_cpudata *cpudata;
>>
>> I don't see any other sane solution, even if this brings the CPU back
>> to normal state and waste power. We should be able to reliably provide
>> value to userspace.
>>
>> Rafael / Sudeep: What you do say ?
>
> The frequency value obtained by kicking the CPU out of idle
> artificially is bogus, though. You may as well return a random number
> instead.

Yes, it may return a randowm number as well.

>
> The frequency of a CPU in an idle state is in fact unknown in the case
> at hand, so returning 0 looks like the cleanest option to me.

I am not sure about how the user will use 'cpuinfo_cur_freq' in sysfs. If I
return 0 when the CPU is idle, when I run a light load on the CPU, I will get a
zero value for 'cpuinfo_cur_freq' when the CPU is idle. When the CPU is not
idle, I will get a non-zero value. The user may feel odd about
'cpuinfo_cur_frreq' switching between a zero value and a non-zero value. They
may hope it can return the frequency when the CPU execute instructions, namely
in C0 state. I am not so sure about the user will look at 'cpuinfo_cur_freq'.

Thanks,
Xiongfeng

>
> Thanks!
>
> .
>