Re: [RFC/RFT] [PATCH 02/10] cpufreq: intel_pstate: Conditional frequency invariant accounting

From: Srinivas Pandruvada
Date: Thu May 17 2018 - 10:44:50 EST


On Thu, 2018-05-17 at 17:04 +0200, Juri Lelli wrote:
> On 17/05/18 12:59, Juri Lelli wrote:
> > On 16/05/18 18:31, Juri Lelli wrote:
> > > On 16/05/18 17:47, Peter Zijlstra wrote:
> > > > On Wed, May 16, 2018 at 05:19:25PM +0200, Juri Lelli wrote:
> > > >
> > > > > Anyway, FWIW I started testing this on a E5-2609 v3 and I'm
> > > > > not seeing
> > > > > hackbench regressions so far (running with schedutil
> > > > > governor).
> > > >
> > > > https://en.wikipedia.org/wiki/Haswell_(microarchitecture)#Serve
> > > > r_processors
> > > >
> > > > Lists the E5 2609 v3 as not having turbo at all, which is
> > > > basically a
> > > > best case scenario for this patch.
> > > >
> > > > As I wrote earlier today; when turbo exists, like say the 2699,
> > > > then
> > > > when we're busy we'll run at U=2.3/3.6 ~ .64, which might
> > > > confuse
> > > > things.
> > >
> > > Indeed. I was mostly trying to see if adding this to the tick
> > > might
> > > introduce noticeable overhead.
> >
> > Blindly testing on an i5-5200U (2.2/2.7 GHz) gave the following
> >
> > # perf bench sched messaging --pipe --thread --group 2 --loop 20000
> >
> > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂcountÂÂÂÂÂÂÂmeanÂÂÂÂÂÂÂstdÂÂÂÂÂminÂÂÂÂÂ50%ÂÂÂ
> > ÂÂÂÂ95%ÂÂÂÂÂÂÂ99%ÂÂÂÂÂmax
> > hostname
> > kernelÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
> > ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ
> > i5-5200U
> > test_afterÂÂÂÂ30.0ÂÂ13.843433ÂÂ0.590605ÂÂ12.369ÂÂ13.810ÂÂ14.85635ÂÂ
> > 15.08205ÂÂ15.127
> > ÂÂÂÂÂÂÂÂÂtest_beforeÂÂÂ30.0ÂÂ13.571167ÂÂ0.999798ÂÂ12.228ÂÂ13.302ÂÂ1
> > 5.57805ÂÂ16.40029ÂÂ16.690
> >
> > It might be interesting to see what happens when using a single CPU
> > only?
> >
> > Also, I will look at how the util signals look when a single CPU is
> > busy..
>
> And this is showing where the problem is (as you were saying [1]):
>
> https://gist.github.com/jlelli/f5438221186e5ed3660194e4f645fe93
>
> Just look at the plots (and ignore setup).
>
> First one (pid:4483) shows a single task busy running on a single
> CPU,
> which seems to be able to sustain turbo for 5 sec. So task util
> reaches
> ~1024.
>
> Second one (pid:4283) shows the same task, but running together with
> other 3 tasks (each one pinned to a different CPU). In this case util
> saturates at ~943, which is due to the fact that max freq is still
> considered to be the turbo one. :/


One more point to note. Even if we calculate some utilization based on
the freq-invariant and arrive at a P-state, we will not be able to
control any P-state in turbo region (not even as a cap) on several
Intel processors using PERF_CTL MSRs.


>
> [1] https://marc.info/?l=linux-kernel&m=152646464017810&w=2