Re: [PATCH] KVM: x86/intr: Explicitly check NMI from guest to eliminate false positives

From: Sean Christopherson
Date: Tue Dec 05 2023 - 19:10:04 EST


On Tue, Dec 05, 2023, Like Xu wrote:
> On 4/12/2023 11:55 pm, Sean Christopherson wrote:
> > On Mon, Dec 04, 2023, Like Xu wrote:
> > > From: Like Xu <likexu@xxxxxxxxxxx>
> > >
> > > Explicitly checking the source of external interrupt is indeed NMI and not
> > > other types in the kvm_arch_pmi_in_guest(), which prevents perf-kvm false
> > > positive samples generated after vm-exit but before kvm_before_interrupt()
> > > from being incorrectly labelled as guest samples:
> >
> > ...
> >
> > > Fixes: 73cd107b9685 ("KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable")
> >
> > The behavior is deliberate, and was added by commit dd60d217062f ("KVM: x86: Fix
> > perf timer mode IP reporting"). *If* we want to undo that, then the best "fix"
> > would be to effective reverting that commit by dropping the IRQ usage of
> > kvm_before_interrupt() and renaming the helpers kvm_{before,after}_nmi(). But
> > my understanding is that the behavior is necessary for select PMU usage.
>
> Thanks for your comment. Yes, the commit dd60d217062f should be tracked.
>
> We don't have to undo the commit, and we also don't want to hurt
> either of the perf/core use cases (including perf NMI and timer modes).
>
> Thanks to the introduction of "enum kvm_intr_type", we can cover both cases
> instead of sacrificing one of two modes, how about:

Hmm, yeah, that should work. It's not the prettiest thing, but I don't see an
easy way to remedy that (I tried). False positives are still possible, but it's
a clear improvement over what we have.

> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index c8c7e2475a18..5db607a564c6 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1868,8 +1868,16 @@ static inline int
> kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn,
> }
> #endif /* CONFIG_HYPERV */
>
> +enum kvm_intr_type {
> + /* Values are arbitrary, but must be non-zero. */
> + KVM_HANDLING_IRQ = 1,
> + KVM_HANDLING_NMI,
> +};
> +
> +/* Linux always use NMI for PMU. */
> #define kvm_arch_pmi_in_guest(vcpu) \
> - ((vcpu) && (vcpu)->arch.handling_intr_from_guest)
> + ((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
> + (in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))
>
> void __init kvm_mmu_x86_module_init(void);
> int kvm_mmu_vendor_module_init(void);
> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
> index 2f7e19166658..4dc38092d599 100644
> --- a/arch/x86/kvm/x86.h
> +++ b/arch/x86/kvm/x86.h
> @@ -431,12 +431,6 @@ static inline bool kvm_notify_vmexit_enabled(struct kvm *kvm)
> return kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_ENABLED;
> }
>
> -enum kvm_intr_type {
> - /* Values are arbitrary, but must be non-zero. */
> - KVM_HANDLING_IRQ = 1,
> - KVM_HANDLING_NMI,
> -};
> -
> static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
> enum kvm_intr_type intr)
> {
> --
> 2.43.0
>
> I noticed that timer mode is used when perf-record uses SW events like
> "cpu-clock" event,
> with or without hw-PMU support. My side of the tests no longer show false
> positives and
> the above diff does not affect the use of perf timer mode as well. Any better move ?