Re: [GIT PULL] Clang feature updates for v5.14-rc1

From: Bill Wendling
Date: Fri Jul 02 2021 - 08:47:03 EST


On Tue, Jun 29, 2021 at 2:04 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, Jun 29, 2021 at 1:44 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > >
> > > And it causes the kernel to be bigger and run slower.
> >
> > Right -- that's expected. It's not designed to be the final kernel
> > someone uses. :)
>
> Well, from what I've seen, you actually want to run real loads in
> production environments for PGO to actually be anything but a bogus
> "performance benchmarks only" kind of thing.
>
The reason we use PGO in this way is because we _cannot_ release a
kernel into production that hasn't had PGO applied to it. The
performance of a non-PGO'ed kernel is a non-starter for rollout. We
try our best to replicate this environment for the benchmarks, which
is the only sane way to do this. I can't imagine that we're the only
ones who run up against this chicken-and-egg problem.

For why we don't use sampling, PGO gives a better performance boost
from an instrumented kernel rather than a sampled profile. I'll work
on getting statistics to show this.

-bw

> Of course, "performance benchmarks only" is very traditional, and
> we've seen that used over and over in the past in this industry. That
> doesn't make it _right_, though.
>
> And if you actually want to have it usable in production environments,
> you really should strive to run code as closely as possible to a
> production kernel too.
>
> You'd want to run something that you can sample over time, and in
> production, not something that you have to build a special kernels for
> that then gets used for a benchmark run, but can't be kept in
> production because it performs so much worse.
>
> Real proper profiles will tell you what *really* matters - and if you
> don't have enough samples to give you good information, then that
> particular code clearly is not important enough to waste PGO on.
>
> This is not all that dissimilar to using gprof information for
> traditional - manual - optimizations.
>
> Sure, instrumented gprof output is better than nothing, but it is
> *hugely* worse than actual proper sampled profiles that actually show
> what matters for performance (as opposed to what runs a lot - the two
> are not necessarily all that closely correlated, with cache misses
> being a thing).
>
> And I really hate how pretty much all of the PGO support seems to be
> just about this inferior method of getting the data.
>
> Linus