Re: [PATCH v9] pgo: add clang's Profile Guided Optimization infrastructure

From: Peter Zijlstra
Date: Mon Jun 14 2021 - 06:57:03 EST


On Mon, Jun 14, 2021 at 02:39:41AM -0700, Bill Wendling wrote:
> On Mon, Jun 14, 2021 at 2:01 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:

> > Because having GCOV, KCOV and PGO all do essentially the same thing
> > differently, makes heaps of sense?
> >
> It does when you're dealing with one toolchain without access to another.

Here's a sekrit, don't tell anyone, but you can get a free copy of GCC
right here:

https://gcc.gnu.org/

We also have this linux-toolchains list (Cc'ed now) that contains folks
from both sides.

> > I understand that the compilers actually generates radically different
> > instrumentation for the various cases, but essentially they're all
> > collecting (function/branch) arcs.
> >
> That's true, but there's no one format for profiling data that's
> usable between all compilers. I'm not even sure there's a good way to
> translate between, say, gcov and llvm's format. To make matters more
> complicated, each compiler's format is tightly coupled to a specific
> version of that compiler. And depending on *how* the data is collected
> (e.g. sampling or instrumentation), it may not give us the full
> benefit of FDO/PGO.

I'm thinking that something simple like:

struct arc {
u64 from;
u64 to;
u64 nr;
u64 cntrs[0];
};

goes a very long way. Stick a header on that says how large cntrs[] is,
and some other data (like load offset and whatnot) and you should be
good.

Combine that with the executable image (say /proc/kcore) to recover
what's @from (call, jmp or conditional branch) and I'm thinking one
ought to be able to construct lots of useful data.

I've also been led to believe that the KCOV data format is not in fact
dependent on which toolchain is used.

> > I'm thinking it might be about time to build _one_ infrastructure for
> > that and define a kernel arc format and call it a day.
> >
> That may be nice, but it's a rather large request.

Given GCOV just died, perhaps you can look at what KCOV does and see if
that can be extended to do as you want. KCOV is actively used and
we actually tripped over all the fun little noinstr bugs at the time.