Re: [PATCH v2] KVM: x86/intr: Explicitly check NMI from guest to eliminate false positives

From: Sean Christopherson
Date: Tue Feb 06 2024 - 16:08:57 EST


On Tue, Feb 06, 2024, Sean Christopherson wrote:
> +Oliver
>
> On Wed, Dec 06, 2023, Like Xu wrote:
> > Note that when vm-exit is indeed triggered by PMI and before HANDLING_NMI
> > is cleared, it's also still possible that another PMI is generated on host.
> > Also for perf/core timer mode, the false positives are still possible since
> > that non-NMI sources of interrupts are not always being used by perf/core.
> > In both cases above, perf/core should correctly distinguish between real
> > RIP sources or even need to generate two samples, belonging to host and
> > guest separately, but that's perf/core's story for interested warriors.
>
> Oliver has a patch[*] that he promised he would send "soon" (wink wink) to
> properly fix events that are configured to exclude the guest. Unless someone
> objects, I'm going to tweak the last part of the changelog to be:
>
> Note that when VM-exit is indeed triggered by PMI and before HANDLING_NMI
> is cleared, it's also still possible that another PMI is generated on host.
> Also for perf/core timer mode, the false positives are still possible since
> that non-NMI sources of interrupts are not always being used by perf/core.
>
> For events that are host-only, perf/core can and should eliminate false
> positives by checking event->attr.exclude_guest, i.e. events that are
> configured to exclude KVM guests should never fire in the guest.
>
> Events that are configured to count host and guest are trickier, perhaps
> impossible to handle with 100% accuracy? And regardless of what accuracy
> is provided by perf/core, improving KVM's accuracy is cheap and easy, with
> no real downsides.

Never mind, this causes KUT's pmu_pebs test to fail:

FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.

It might be a test bug, but I have neither the time nor the inclination to
investigate.


Like,

If you want any chance of your patches going anywhere but my trash folder, you
need to change your upstream workflow to actually run tests. I would give most
people the benefit of the doubt, e.g. assume they didn't have the requisite
hardware, or didn't realize which tests would be relevant/important. But this
is a recurring problem, and you have been warned, multiple times.