Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely

From: Rafael J. Wysocki
Date: Mon May 08 2017 - 18:22:52 EST


On Monday, May 08, 2017 09:31:19 AM Viresh Kumar wrote:
> On 08-05-17, 11:49, Wanpeng Li wrote:
> > Hi Rafael,
> > 2017-03-22 7:08 GMT+08:00 Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>:
> > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > >
> > > The way the schedutil governor uses the PELT metric causes it to
> > > underestimate the CPU utilization in some cases.
> > >
> > > That can be easily demonstrated by running kernel compilation on
> > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > register. Namely, the expected result would be that when all CPUs
> > > were 100% busy, all of them would be requested to run in the maximum
> > > P-state, but observation shows that this clearly isn't the case.
> > > The CPUs run in the maximum P-state for a while and then are
> > > requested to run slower and go back to the maximum P-state after
> > > a while again. That causes the actual frequency of the processor to
> > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > which clearly is not desirable.
> > >
> > > That has been attributed to CPU utilization metric updates on task
> > > migration that cause the total utilization value for the CPU to be
> > > reduced by the utilization of the migrated task. If that happens,
> > > the schedutil governor may see a CPU utilization reduction and will
> > > attempt to reduce the CPU frequency accordingly right away. That
> > > may be premature, though, for example if the system is generally
> > > busy and there are other runnable tasks waiting to be run on that
> > > CPU already.
> > >
> > > This is unlikely to be an issue on systems where cpufreq policies are
> > > shared between multiple CPUs, because in those cases the policy
> > > utilization is computed as the maximum of the CPU utilization values
> >
> > Sorry for one question maybe not associated with this patch. If the
> > cpufreq policy is shared between multiple CPUs, the function
> > intel_cpufreq_target() just updates IA32_PERF_CTL MSR of the cpu
> > which is managing this policy, I wonder whether other cpus which are
> > affected should also update their per-logical cpu's IA32_PERF_CTL MSR?
>
> The CPUs share the policy when they share their freq/voltage rails and so
> changing perf state of one CPU should result in that changing for all the CPUs
> in that policy. Otherwise, they can't be considered to be part of the same
> policy.

To be entirely precise, this depends on the granularity of the HW interface.

If the interface is per-logical-CPU, we will use it this way for efficiency
reasons and even if there is some coordination on the HW side, the information
on how exactly it works usually is limited.

Thanks,
Rafael