Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending

From: Yosry Ahmed

Date: Tue Jun 16 2026 - 14:09:24 EST


On Tue, Jun 16, 2026 at 10:46 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Mon, Jun 15, 2026, Yosry Ahmed wrote:
> > > > The code makes sense to me but I am trying to make sense of the changelog.
> > >
> > > What part (parts?) is confusing? Honest question. I'm trying to reword the
> > > changelog to make it "better", but I'm failing miserable because I don't know
> > > what's wrong :-)
> >
> > 1. For kvm_vcpu_has_events() being unaffected, the explanation in
> > paragraph #3 is focused on the code path from nested_vmx_run() ->
> > kvm_emulate_halt_noskip(). I don't immediately see how
> > kvm_arch_vcpu_runnable() is unaffected.
>
> To reach kvm_vcpu_has_events(), kvm_vcpu_running() needs to return false. For
> that to happen, vcpu->arch.mp_state needs to be something other than RUNNABLE.
>
> If nested_run_pending is true, then mp_state *must* be RUNNABLE (barring bugs or
> stupid userspace), because KVM shouldn't emulate VMRUN/VMLAUNCH/VMRESUME while
> the vCPU is !RUNNABLE.
>
> I didn't include that in the changelog because I thought it was obvious, but
> obviously (LOL) not :-D
>
> I called out the GUEST_ACTIVITY_HLT case because (to me) that is less obvious.
>
> > 2. More importantly, paragraphs #3 and #4 read like this patch would
> > regress kvm_vcpu_ready_for_interrupt_injection() and
> > kvm_vcpu_has_events() if it affected them. Maybe clearly state that
> > this patch is the right thing to do for these 2 functions as well, but
> > they are more-or-less unaffected by the bug anyway? For
> > kvm_vcpu_ready_for_interrupt_injection(), maybe just make it more
> > clear in paragraph #4 that it currently incorrectly treats interrupts
> > as allowed in the problematic scenario, but it is not a problem
> > because ..., and it only results in a spurious exit to userspace (or
> > not even that?).
>
> Is this better?

Yes, much clearer (to me). Thanks for bearing with me :)

>
> When querying whether or not interrupts (IRQs) are allowed, check for a
> pending nested run _after_ checking whether or not interrupts are blocked.
> If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
> be blocked while running L2, and interrupts will indeed be blocked once the
> nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
> being allowed.
>
> For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
> immediately request an IRQ window, instead of forcing an exit and _then_
> requesting an IRQ window (because after the forced exit, KVM will see that
> interrupts are blocked).
>
> For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
> affected in practice. Barring KVM bugs or misbehaving userspace (at which
> point all architectural guarantees are off), kvm_vcpu_has_events() is
> unreachable when a nested run is pending. To reach kvm_vcpu_has_events(),
> kvm_vcpu_running() needs to return false, i.e. vcpu->arch.mp_state needs
> to be something other than RUNNABLE. If nested_run_pending is true, then
> mp_state *must* be RUNNABLE (again barring bugs or stupid userspace),
> because KVM shouldn't emulate VMRUN/VMLAUNCH/VMRESUME while the vCPU is
> !RUNNABLE.
>
> The one "near miss" is VMX's GUEST_ACTIVITY_STATE field, which allows L1 to
> put the vCPU into HLT or WFS as part of nested VMLAUNCH/VMRESUME. However,
> KVM clears nested_run_pending prior to calling kvm_emulate_halt_noskip()
> when putting L2 into HLT via GUEST_ACTIVITY_HLT, and also clears the flag
> before setting mp_state to INIT_RECEIVED. SVM has no equivalent to
> GUEST_ACTIVITY_STATE.
>
> I.e. the vCPU will always be runnable if a nested run is pending, and thus
> kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
> as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
> support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
> KVM shouldn't be faulting in memory with a pending nested VM-Enter.
>
> As for kvm_vcpu_ready_for_interrupt_injection(), KVM's current behavior of
> incorrectly treating interrupts as being allowed could result in KVM
> prematurely exiting to userspace to accept an ExtINT. But, KVM will still
> hold/block the ExtINT and request its own IRQ window. I.e. the net effect
> is more or less the same as the for-injection case, the unnecessary exit
> just happens at a different boundary.