[RFC PATCH 0/2] x86/perf/amd: AMD PMC counters and NMI latency

From: Lendacky, Thomas
Date: Mon Mar 11 2019 - 12:48:42 EST


This patch series addresses issues with increased NMI latency in newer
AMD processors that can result in unknown NMI messages when PMC counters
are active.

The following fixes are included in this series:

- Resolve a race condition when disabling an overflowed PMC counter,
specifically when updating the PMC counter with a new value.
- Resolve handling of multiple active PMC counter overflows in the perf
NMI handler and when to report that the NMI is not related to a PMC.

I'm looking for feedback on the direction used in these two patches. In
the first patch, the reset check loop typically only runs one iteration,
and very rarely ever ran 3 or 4 times. I also looked at an alternative to
the first patch where I set a per-CPU, per-PMC flag that can be checked
in the NMI handler when it is found that the PMC hasn't overflowed. That
would mean the sample would be lost, though (on average I was seeing about
a 0.20% sample loss that way).

---

This patch series is based off of the perf/core branch of tip:
https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core

Commit c978b9460fe1 ("Merge tag 'perf-core-for-mingo-5.1-20190225' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core")