Re: [Question]: about 'cpuinfo_cur_freq' shown in sysfs when the CPU is in idle state

From: Rafael J. Wysocki
Date: Wed Jun 03 2020 - 09:40:03 EST


On Wed, Jun 3, 2020 at 9:52 AM Viresh Kumar <viresh.kumar@xxxxxxxxxx> wrote:
>
> On 02-06-20, 11:34, Xiongfeng Wang wrote:
> > Hi Viresh,
> >
> > Sorry to disturb you about another problem as follows.
> >
> > CPPC use the increment of Desired Performance counter and Reference Performance
> > counter to get the CPU frequency and show it in sysfs through
> > 'cpuinfo_cur_freq'. But ACPI CPPC doesn't specifically define the behavior of
> > these two counters when the CPU is in idle state, such as stop incrementing when
> > the CPU is in idle state.
> >
> > ARMv8.4 Extension inctroduced support for the Activity Monitors Unit (AMU). The
> > processor frequency cycles and constant frequency cycles in AMU can be used as
> > Delivered Performance counter and Reference Performance counter. These two
> > counter in AMU does not increase when the PE is in WFI or WFE. So the increment
> > is zero when the PE is in WFI/WFE. This cause no issue because
> > 'cppc_get_rate_from_fbctrs()' in cppc_cpufreq driver will check the increment
> > and return the desired performance if the increment is zero.
> >
> > But when the CPU goes into power down idle state, accessing these two counters
> > in AMU by memory-mapped address will return zero. Such as CPU1 went into power
> > down idle state and CPU0 try to get the frequency of CPU1. In this situation,
> > will display a very big value for 'cpuinfo_cur_freq' in sysfs. Do you have some
> > advice about this problem ?
> >
> > I was thinking about an idea as follows. We can run 'cppc_cpufreq_get_rate()' on
> > the CPU to be measured, so that we can make sure the CPU is in C0 state when we
> > access the two counters. Also we can return the actual frequency rather than
> > desired performance when the CPU is in WFI/WFE. But this modification will
> > change the existing logical and I am not sure if this will cause some bad effect.
> >
> >
> > diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
> > index 257d726..ded3bcc 100644
> > --- a/drivers/cpufreq/cppc_cpufreq.c
> > +++ b/drivers/cpufreq/cppc_cpufreq.c
> > @@ -396,9 +396,10 @@ static int cppc_get_rate_from_fbctrs(struct cppc_cpudata *cpu,
> > return cppc_cpufreq_perf_to_khz(cpu, delivered_perf);
> > }
> >
> > -static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
> > +static int cppc_cpufreq_get_rate_cpu(void *info)
> > {
> > struct cppc_perf_fb_ctrs fb_ctrs_t0 = {0}, fb_ctrs_t1 = {0};
> > + unsigned int cpunum = *(unsigned int *)info;
> > struct cppc_cpudata *cpu = all_cpu_data[cpunum];
> > int ret;
> >
> > @@ -418,6 +419,22 @@ static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
> > return cppc_get_rate_from_fbctrs(cpu, fb_ctrs_t0, fb_ctrs_t1);
> > }
> >
> > +static unsigned int cppc_cpufreq_get_rate(unsigned int cpunum)
> > +{
> > + unsigned int ret;
> > +
> > + ret = smp_call_on_cpu(cpunum, cppc_cpufreq_get_rate_cpu, &cpunum, true);
> > +
> > + /*
> > + * convert negative error code to zero, otherwise we will display
> > + * an odd value for 'cpuinfo_cur_freq' in sysfs
> > + */
> > + if (ret < 0)
> > + ret = 0;
> > +
> > + return ret;
> > +}
> > +
> > static int cppc_cpufreq_set_boost(struct cpufreq_policy *policy, int state)
> > {
> > struct cppc_cpudata *cpudata;
>
> I don't see any other sane solution, even if this brings the CPU back
> to normal state and waste power. We should be able to reliably provide
> value to userspace.
>
> Rafael / Sudeep: What you do say ?

The frequency value obtained by kicking the CPU out of idle
artificially is bogus, though. You may as well return a random number
instead.

The frequency of a CPU in an idle state is in fact unknown in the case
at hand, so returning 0 looks like the cleanest option to me.

Thanks!