On Fri, Jul 1, 2022 at 4:49 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
On Fri, Jul 01, 2022 at 03:17:54AM -0700, Bill Wendling wrote:
> On Fri, Jul 1, 2022 at 2:02 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Tue, Jun 28, 2022 at 07:08:48PM +0200, Jose E. Marchesi wrote:
> > >
> > > [Added linux-toolchains@vger in CC]
> > >
> > > It would be interesting to have some discussion in the Toolchains track
> > > on building the kernel with PGO/FDO. I have seen a raise on interest on
> > > the topic in several companies, but it would make very little sense if
> > > no kernel hacker is interested in participating... anybody?
> >
> > I know there's been a lot of work in this area, but none of it seems to
> > have trickled down to be easy enough for me to use it.
>
> We use an instrumented kernel to collect the data we need. It gives us
> the best payoff, because the profiling data is more fine-grained and
> accurate. (PGO does much more than make inlining decisions.)
>
> If I recall correctly, you previously suggested using sampling data.
> (Correct?) Is there a document or article that outlines that process?
IIRC Google has LBR sample driven PGO somewhere as well. ISTR that being
the whole motivation for that gruesome Zen3 BRS hack.
Google got me this: https://research.google.com/pubs/archive/45290.pdf
Right. However, there's a chicken-and-egg issue with AutoFDO for the
production kernel. We can't release a kernel that hasn't been compiled
with PGO/FDO. We could only release it in a test environment, in which
case we could use AutoFDO. However, the document says that AutoFDO
only reaches ~90% of FDO. They list some reasons for this, but
nonetheless I suspect that the delta would be too severe for us to
release the kernel.
As for LBR, that will work with Intel/AMD, but I thought that LBR
doesn't exist for Arm processors (my knowledge could be out of date on
this).
What would make PGO (sample-based or instrumented) easy enough for you
to use? What're the key elements missing?
-bw