Re: [PATCH] perf/x86/amd: Change NMI latency mitigation to use a timestamp

From: Peter Zijlstra
Date: Thu Aug 01 2019 - 17:48:22 EST


On Thu, Aug 01, 2019 at 11:34:23PM +0200, Thomas Gleixner wrote:
> On Thu, 1 Aug 2019, Lendacky, Thomas wrote:
> > On 8/1/19 4:16 PM, Peter Zijlstra wrote:
> > > On Thu, Aug 01, 2019 at 06:57:41PM +0000, Lendacky, Thomas wrote:
> > >> From: Tom Lendacky <thomas.lendacky@xxxxxxx>
> > >>
> > >> It turns out that the NMI latency workaround from commit 6d3edaae16c6
> > >> ("x86/perf/amd: Resolve NMI latency issues for active PMCs") ends up
> > >> being too conservative and results in the perf NMI handler claiming NMIs
> > >> to easily on AMD hardware when the NMI watchdog is active.
> > >>
> > >> This has an impact, for example, on the hpwdt (HPE watchdog timer) module.
> > >> This module can produce an NMI that is used to reset the system. It
> > >> registers an NMI handler for the NMI_UNKNOWN type and relies on the fact
> > >> that nothing has claimed an NMI so that its handler will be invoked when
> > >> the watchdog device produces an NMI. After the referenced commit, the
> > >> hpwdt module is unable to process its generated NMI if the NMI watchdog is
> > >> active, because the current NMI latency mitigation results in the NMI
> > >> being claimed by the perf NMI handler.
> > >>
> > >> Update the AMD perf NMI latency mitigation workaround to, instead, use a
> > >> window of time. Whenever a PMC is handled in the perf NMI handler, set a
> > >> timestamp which will act as a perf NMI window. Any NMIs arriving within
> > >> that window will be claimed by perf. Anything outside that window will
> > >> not be claimed by perf. The value for the NMI window is set to 100 msecs.
> > >> This is a conservative value that easily covers any NMI latency in the
> > >> hardware. While this still results in a window in which the hpwdt module
> > >> will not receive its NMI, the window is now much, much smaller.
> > >
> > > Blergh, I so hate all this. The proposed patch is basically duct tape.
> >
> > Yeah, I'm not a fan either.
> >
> > >
> > > The horribly retarded x86 NMI infrastructure strikes again :/
> > >
> > > Tom; do you have any idea how expensive it is to twiddle CR8 and play
> > > games with interrupt priorities instead of piling world + dog on this
> > > one NMI line? (as compared to CLI/STI)
> >
> > I can check on that. What are you thinking?
>
> Avoid the whole NMI mess, make the PMC interrupt a proper vector in the
> highest prio bucket and instead of using CLI/STI use CR8. That would have
> the additional advantage that we could prevent perf "NMI" then occsionally :)

Exactly, and not only the PMC, we can basically start giving out actual
vectors on register_nmi_handler() and do away with all that shared line
crap.

And then the actual NMI line will be mostly empty again, and it can read
its stupid slow reason port again.

One complication though; IRET et al only do EFLAGS, not CR8, so that's
going to be massive fun :-(

Did I say I hates the x86 interrupt scheme?