Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
From: Patrick Bellasi
Date: Mon Mar 20 2017 - 09:07:10 EST
On 20-Mar 13:50, Peter Zijlstra wrote:
> On Mon, Mar 20, 2017 at 01:35:12PM +0100, Rafael J. Wysocki wrote:
> > On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote:
> > > On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote:
> > > > From: Rafael J. Wysocki <rafael.j.wysocki@xxxxxxxxx>
> > > >
> > > > The PELT metric used by the schedutil governor underestimates the
> > > > CPU utilization in some cases. The reason for that may be time spent
> > > > in interrupt handlers and similar which is not accounted for by PELT.
> > > >
> > > > That can be easily demonstrated by running kernel compilation on
> > > > a Sandy Bridge Intel processor, running turbostat in parallel with
> > > > it and looking at the values written to the MSR_IA32_PERF_CTL
> > > > register. Namely, the expected result would be that when all CPUs
> > > > were 100% busy, all of them would be requested to run in the maximum
> > > > P-state, but observation shows that this clearly isn't the case.
> > > > The CPUs run in the maximum P-state for a while and then are
> > > > requested to run slower and go back to the maximum P-state after
> > > > a while again. That causes the actual frequency of the processor to
> > > > visibly oscillate below the sustainable maximum in a jittery fashion
> > > > which clearly is not desirable.
> > > >
> > > > To work around this issue use the observation that, from the
> > > > schedutil governor's perspective, CPUs that are never idle should
> > > > always run at the maximum frequency and make that happen.
> > > >
> > > > To that end, add a counter of idle calls to struct sugov_cpu and
> > > > modify cpuidle_idle_call() to increment that counter every time it
> > > > is about to put the given CPU into an idle state. Next, make the
> > > > schedutil governor look at that counter for the current CPU every
> > > > time before it is about to start heavy computations. If the counter
> > > > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms),
> > > > the CPU has not been idle for at least that long and the governor
> > > > will choose the maximum frequency for it without looking at the PELT
> > > > metric at all.
> > >
> > > Why the time limit?
> >
> > One iteration appeared to be a bit too aggressive, but honestly I think
> > I need to check again if this thing is regarded as viable at all.
> >
>
> I don't hate the idea; if we don't hit idle; we shouldn't shift down. I
> just wonder if we don't already keep a idle-seqcount somewhere; NOHZ and
> RCU come to mind as things that might already use something like that.
Maybe the problem is not going down (e.g. when there are only small
CFS tasks it makes perfectly sense) but instead not being fast enough
on rampin-up when a new RT task is activated.
And this boils down to two main point:
1) throttling for up transitions perhaps is only harmful
2) the call sites for schedutils updates are not properly positioned
in specific scheduler decision points.
The proposed patch is adding yet another throttling mechanism, perhaps
on top of one which already needs to be improved.
--
#include <best/regards.h>
Patrick Bellasi