Re: [PATCH 3/3] KVM: x86: Check for injected exceptions before queuing a debug exception
From: Sean Christopherson
Date: Mon Mar 02 2026 - 18:23:55 EST
On Fri, Feb 27, 2026, Yosry Ahmed wrote:
> > > That being said, I hate nested_run_in_progress. It's too close to
> > > nested_run_pending and I am pretty sure they will be mixed up.
> >
> > Agreed, though the fact that name is _too_ close means that, aside from the
> > potential for disaster (minor detail), it's accurate.
> >
> > One thought is to hide nested_run_in_progress beyond a KConfig, so that attempts
> > to use it for anything but the sanity check(s) would fail the build. I don't
> > really want to create yet another KVM_PROVE_xxx though, but unlike KVM_PROVE_MMU,
> > I think we want to this enabled in production.
> >
> > I'll chew on this a bit...
>
> Maybe (if we go this direction) name it very explicitly
> warn_on_nested_exception if it's only intended to be used for the
> sanity checks?
It's not just about exceptions though. That's the case that has caused a rash
of recent problems, but the rule isn't specific to exceptions, it's very broadly
Thou Shalt Not Cancel VMRUN.
I think that's where there's some disconnect. We can't make the nested_run_pending
warnings go away by adding more sanity checks, and I am dead set against removing
those warnings.
Aha! Idea. What if we turn nested_run_pending into a u8, and use a magic value
of '2' to indicate that userspace gained control of the CPU since nested_run_pending
was set, and then only WARN on nested_run_pending==1? That way we don't have to
come up with a new name, and there's zero chance of nested_run_pending and something
like nested_run_in_progress getting out of sync.
---
arch/x86/include/asm/kvm_host.h | 6 +++++-
arch/x86/kvm/svm/nested.c | 3 ++-
arch/x86/kvm/vmx/nested.c | 4 ++--
arch/x86/kvm/x86.c | 7 +++++++
arch/x86/kvm/x86.h | 10 ++++++++++
5 files changed, 26 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 19b3790e5e99..a8d39b3aff6a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1104,8 +1104,12 @@ struct kvm_vcpu_arch {
* can only occur at instruction boundaries. The only exception is
* VMX's "notify" exits, which exist in large part to break the CPU out
* of infinite ucode loops, but can corrupt vCPU state in the process!
+ *
+ * For all intents and purposes, this is a boolean, but it's tracked as
+ * a u8 so that KVM can detect when userspace may have stuffed vCPU
+ * state and generated an architecturally-impossible VM-Exit.
*/
- bool nested_run_pending;
+ u8 nested_run_pending;
#if IS_ENABLED(CONFIG_HYPERV)
hpa_t hv_root_tdp;
diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index c2d4c9c63146..77ff9ead957c 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1138,7 +1138,8 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
/* Exit Guest-Mode */
leave_guest_mode(vcpu);
svm->nested.vmcb12_gpa = 0;
- WARN_ON_ONCE(vcpu->arch.nested_run_pending);
+
+ kvm_warn_on_nested_run_pending(vcpu);
kvm_clear_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 031075467a6d..5659545360dc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5042,7 +5042,7 @@ void __nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx->nested.mtf_pending = false;
/* trying to cancel vmlaunch/vmresume is a bug */
- WARN_ON_ONCE(vcpu->arch.nested_run_pending);
+ kvm_warn_on_nested_run_pending(vcpu);
#ifdef CONFIG_KVM_HYPERV
if (kvm_check_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu)) {
@@ -6665,7 +6665,7 @@ bool nested_vmx_reflect_vmexit(struct kvm_vcpu *vcpu)
unsigned long exit_qual;
u32 exit_intr_info;
- WARN_ON_ONCE(vcpu->arch.nested_run_pending);
+ kvm_warn_on_nested_run_pending(vcpu);
/*
* Late nested VM-Fail shares the same flow as nested VM-Exit since KVM
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d9..30ff5a755572 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12023,6 +12023,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu)
if (r <= 0)
goto out;
+ /*
+ * If userspace may have modified vCPU state, mark nested_run_pending
+ * as "untrusted" to avoid triggering false-positive WARNs.
+ */
+ if (vcpu->arch.nested_run_pending == 1)
+ vcpu->arch.nested_run_pending = 2;
+
r = vcpu_run(vcpu);
out:
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 94d4f07aaaa0..d3003c8be961 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -188,6 +188,16 @@ static inline bool kvm_can_set_cpuid_and_feature_msrs(struct kvm_vcpu *vcpu)
return vcpu->arch.last_vmentry_cpu == -1 && !is_guest_mode(vcpu);
}
+/*
+ * WARN if a nested VM-Enter is pending completion, and userspace hasn't gained
+ * control since the nested VM-Enter was initiated (in which case, userspace
+ * may have modified vCPU state to induce an architecturally invalid VM-Exit).
+ */
+static inline void kvm_warn_on_nested_run_pending(struct kvm_vcpu *vcpu)
+{
+ WARN_ON_ONCE(vcpu->arch.nested_run_pending == 1);
+}
+
static inline void kvm_set_mp_state(struct kvm_vcpu *vcpu, int mp_state)
{
vcpu->arch.mp_state = mp_state;
base-commit: a68a4bbc5b9ce5b722473399f05cb05217abaee8
--