Re: [PATCH] cpufreq: intel_pstate: Optimize IO boost in non HWP mode

From: Francisco Jerez
Date: Tue Sep 11 2018 - 13:54:33 EST


"Rafael J. Wysocki" <rjw@xxxxxxxxxxxxx> writes:

> On Thursday, September 6, 2018 6:20:08 AM CEST Francisco Jerez wrote:
>>
>> --==-=-=
>> Content-Type: multipart/mixed; boundary="=-=-="
>>
>> --=-=-=
>> Content-Type: text/plain; charset=utf-8
>> Content-Disposition: inline
>> Content-Transfer-Encoding: quoted-printable
>>
>> Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx> writes:
>>
>> > [...]
>> >
>> >> > >=20
>> >> > > This patch causes a number of statistically significant
>> >> > > regressions
>> >> > > (with significance of 1%) on the two systems I've tested it
>> >> > > on. On
>> >> > > my
>> >> >=20
>> >> > Sure. These patches are targeted to Atom clients where some of
>> >> > these
>> >> > server like workload may have some minor regression on few watts
>> >> > TDP
>> >> > parts.
>> >>=20
>> >> Neither the 36% regression of fs-mark, the 21% regression of sqlite,
>> >> nor
>> >> the 10% regression of warsaw qualify as small. And most of the test
>> >> cases on the list of regressions aren't exclusively server-like, if
>> >> at
>> >> all. Warsaw, gtkperf, jxrendermark and lightsmark are all graphics
>> >> benchmarks -- Latency is as important if not more for interactive
>> >> workloads than it is for server workloads. In the case of a conflict
>> >> like the one we're dealing with right now between optimizing for
>> >> throughput (e.g. for the maximum number of requests per second) and
>> >> optimizing for latency (e.g. for the minimum request duration), you
>> >> are
>> >> more likely to be concerned about the former than about the latter in
>> >> a
>> >> server setup.
>> >
>> > Eero,
>> > Please add your test results here.
>> >
>> > No matter which algorithm you do, there will be variations. So you have
>> > to look at the platforms which you are targeting. For this platform=20
>> > number one item is use of less turbo and hope you know why?
>>
>> Unfortunately the current controller uses turbo frequently on Atoms for
>> TDP-limited graphics workloads regardless of IOWAIT boosting. IOWAIT
>> boosting simply exacerbated the pre-existing energy efficiency problem.
>
> My current understanding of the issue at hand is that using IOWAIT boosting
> on Atoms is a regression relative to the previous behavior.

Not universally. IOWAIT boosting helps under roughly the same
conditions on Atom as it does on big core, so applying this patch will
necessarily cause regressions too (see my reply from Sep. 3 for some
numbers), and won't completely restore the previous behavior since it
simply decreases the degree of IOWAIT boosting applied without being
able to avoid it (c.f. the series I'm working on that does something
similar to IOWAIT boosting when it's able to determine it's actually
CPU-bound, which prevents energy inefficient behavior for non-CPU-bound
workloads that don't benefit from a higher CPU clock frequency anyway).

> That is what Srinivas is trying to address here AFAICS.
>
> Now, you seem to be saying that the overall behavior is suboptimal and the
> IOWAIT boosting doesn't matter that much,

I was just saying that IOWAIT boosting is less than half of the energy
efficiency problem, and this patch only partially addresses that half of
the problem.

> so some deeper changes are needed anyway. That may be the case, but
> if there is a meaningful regression, we should first get back to the
> point where it is not present and then to take care of the more
> general problems.
>
> So, I'd like to understand how much of a problem the IOWAIT boosting really is
> in the first place. If it is significant enough, let's address it first, this
> way or another, and move on to the other problems subsequently.
>

See the Unigine and Gfxbench numbers I provided in my reply from Sep. 3
to get an idea of the magnitude of the IOWAIT boosting problem vs. the
overall energy efficiency problem addressed by my series.

> Thanks,
> Rafael

Attachment: signature.asc
Description: PGP signature