Re: [PATCH v2] KVM: x86/intr: Explicitly check NMI from guest to eliminate false positives

From: Sean Christopherson
Date: Mon Feb 26 2024 - 19:11:19 EST


On Sun, Feb 18, 2024, Like Xu wrote:
> On 7/2/2024 5:08 am, Sean Christopherson wrote:
> > On Tue, Feb 06, 2024, Sean Christopherson wrote:
> > Never mind, this causes KUT's pmu_pebs test to fail:
> >
> > FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
> > FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x2): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x4): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1f000008): GP counter 0 (0xfffffffffffe): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x2): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x4): Multiple (0x700000055): No OVF irq, none PEBS records.
> > FAIL: Adaptive (0x1f000008): Multiple (0x700000055): No OVF irq, none PEBS records.
> >
> > It might be a test bug, but I have neither the time nor the inclination to
> > investigate.
>
> For PEBS ovf case, we have "in_nmi() = 0x100000" from the core kernel and
> the following diff fixes the issue:
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 995760ba072f..dcf665251fce 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1891,7 +1891,7 @@ enum kvm_intr_type {
> /* Enable perf NMI and timer modes to work, and minimise false positives. */
> #define kvm_arch_pmi_in_guest(vcpu) \
> ((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
> - (in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
> + (!!in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
>
> void __init kvm_mmu_x86_module_init(void);
> int kvm_mmu_vendor_module_init(void);
>
> , does it help (tests passed on ICX) ?

Yes, that resolves the issues I was seeing. I'll get this applied with the above
squashed.

I'll also see if the tip tree folks would be open to converting the in_{nmi,hardirq,...}()
macros to functions that return bools (or at least casting to bools in the macros).
I can't see any reason for in_nmi() to effectively return an int since it's just
a wrapper to nmi_count(), and this seems like a disaster waiting to happen.

> > If you want any chance of your patches going anywhere but my trash folder, you
> > need to change your upstream workflow to actually run tests. I would give most
> > people the benefit of the doubt, e.g. assume they didn't have the requisite
> > hardware, or didn't realize which tests would be relevant/important. But this
> > is a recurring problem, and you have been warned, multiple times.
>
> Sorry, my CI resources are diverted to other downstream projects.
> But there's no doubt it's my fault and this behavior will be corrected.

Thank you.