Re: [PATCH] KVM: x86/intr: Explicitly check NMI from guest to eliminate false positives

From: Like Xu
Date: Tue Dec 05 2023 - 01:24:39 EST


On 4/12/2023 11:55 pm, Sean Christopherson wrote:
On Mon, Dec 04, 2023, Like Xu wrote:
From: Like Xu <likexu@xxxxxxxxxxx>

Explicitly checking the source of external interrupt is indeed NMI and not
other types in the kvm_arch_pmi_in_guest(), which prevents perf-kvm false
positive samples generated after vm-exit but before kvm_before_interrupt()
from being incorrectly labelled as guest samples:

...

Fixes: 73cd107b9685 ("KVM: x86: Drop current_vcpu for kvm_running_vcpu + kvm_arch_vcpu variable")

The behavior is deliberate, and was added by commit dd60d217062f ("KVM: x86: Fix
perf timer mode IP reporting"). *If* we want to undo that, then the best "fix"
would be to effective reverting that commit by dropping the IRQ usage of
kvm_before_interrupt() and renaming the helpers kvm_{before,after}_nmi(). But
my understanding is that the behavior is necessary for select PMU usage.

Thanks for your comment. Yes, the commit dd60d217062f should be tracked.

We don't have to undo the commit, and we also don't want to hurt
either of the perf/core use cases (including perf NMI and timer modes).

Thanks to the introduction of "enum kvm_intr_type", we can cover both cases
instead of sacrificing one of two modes, how about:

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c8c7e2475a18..5db607a564c6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1868,8 +1868,16 @@ static inline int kvm_arch_flush_remote_tlbs_range(struct kvm *kvm, gfn_t gfn,
}
#endif /* CONFIG_HYPERV */

+enum kvm_intr_type {
+ /* Values are arbitrary, but must be non-zero. */
+ KVM_HANDLING_IRQ = 1,
+ KVM_HANDLING_NMI,
+};
+
+/* Linux always use NMI for PMU. */
#define kvm_arch_pmi_in_guest(vcpu) \
- ((vcpu) && (vcpu)->arch.handling_intr_from_guest)
+ ((vcpu) && (vcpu)->arch.handling_intr_from_guest && \
+ (in_nmi() == ((vcpu)->arch.handling_intr_from_guest == KVM_HANDLING_NMI)))

void __init kvm_mmu_x86_module_init(void);
int kvm_mmu_vendor_module_init(void);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 2f7e19166658..4dc38092d599 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -431,12 +431,6 @@ static inline bool kvm_notify_vmexit_enabled(struct kvm *kvm)
return kvm->arch.notify_vmexit_flags & KVM_X86_NOTIFY_VMEXIT_ENABLED;
}

-enum kvm_intr_type {
- /* Values are arbitrary, but must be non-zero. */
- KVM_HANDLING_IRQ = 1,
- KVM_HANDLING_NMI,
-};
-
static __always_inline void kvm_before_interrupt(struct kvm_vcpu *vcpu,
enum kvm_intr_type intr)
{
--
2.43.0

I noticed that timer mode is used when perf-record uses SW events like "cpu-clock" event,
with or without hw-PMU support. My side of the tests no longer show false positives and
the above diff does not affect the use of perf timer mode as well. Any better move ?