Re: [PATCH v1 00/13] perf/x86/amd: Add AMD Fam19h Branch Sampling support

From: Stephane Eranian
Date: Thu Oct 28 2021 - 14:30:44 EST


On Wed, Sep 15, 2021 at 2:04 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Tue, Sep 14, 2021 at 10:55:12PM -0700, Stephane Eranian wrote:
> > On Thu, Sep 9, 2021 at 1:55 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Sep 09, 2021 at 12:56:47AM -0700, Stephane Eranian wrote:
> > > > This patch series adds support for the AMD Fam19h 16-deep branch sampling
> > > > feature as described in the AMD PPR Fam19h Model 01h Revision B1 section 2.1.13.
> > >
> > > Yay..
> > >
> > > > BRS interacts with the NMI interrupt as well. Because enabling BRS is expensive,
> > > > it is only activated after P event occurrences, where P is the desired sampling period.
> > > > At P occurrences of the event, the counter overflows, the CPU catches the NMI interrupt,
> > > > activates BRS for 16 branches until it saturates, and then delivers the NMI to the kernel.
> > >
> > > WTF... ?!? Srsly? You're joking right?
> > >
> >
> > As I said, this is because of the cost of running BRS usually for
> > millions of branches to keep only the last 16.
> > Running branch sampling in general on any arch is never totally free.
>
> Holding up the NMI will disrupt the sampling of the other events, which
> is, IMO unacceptible and would require this event to be exclusive on the
> whole PMU, simply because sharing it doesn't work.
>
Sorry for the long delay, I have been very busy.

You are right on this. It would hold the NMI for 16 taken branches.
Making the event exclusive creates a problem with the NMI watchdog.
We can try to hack something in to allow NMI watchdog + the sampling
event and nothing else.

> (also, other NMI sources might object)
>
On AMD, there is also IBS op, IBS Fetch both firing on NMI. but that
is less of a concern because the instruction address is captured by IBS
and the interrupted IP is not useful. So the interrupt skid is not important.

> Also, by only having LBRs post overflow you can't apply LBR based
> analysis to other events, which seems quite limiting.
>
This is a very limited functionality designed to support basic sampling
primarily to support autoFDO where there is only one sampling event.

> This really seems like a very sub-optimal solution. I mean, it's awesome
> AMD gets branch records, but this seems a very poor solution.

For now, this is what we have. It is important to get some basic form of branch
sampling on Zen3 even if it is not perfect because it enables optimizations such
as autoFDO for compilers today. We have verified that autoFDO works well with
branch sampling on Zen3.

I hope it will improve in the future.