Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending

From: Yosry Ahmed

Date: Mon Jun 15 2026 - 12:48:12 EST


On Mon, Jun 15, 2026 at 9:40 AM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:
>
> On Fri, Jun 12, 2026 at 5:04 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> >
> > When querying whether or not interrupts (IRQs) are allowed, check for a
> > pending nested run _after_ checking whether or not interrupts are blocked.
> > If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
> > be blocked while running L2, and interrupts will indeed be blocked once the
> > nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
> > being allowed.
> >
> > For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
> > immediately request an IRQ window, instead of forcing an exit and _then_
> > requesting an IRQ window (because after the forced exit, KVM will see that
> > interrupts are blocked).
> >
> > For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
> > affected in practice. kvm_vcpu_has_events() is unreachable when a nested
> > run is pending, as KVM clears nested_run_pending prior to calling
> > kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT,
> > and SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will
> > always be runnable if a nested run is pending, and thus
> > kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
> > as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
> > support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
> > KVM shouldn't be faulting in memory with a pending nested VM-Enter.
> >
> > As for kvm_vcpu_ready_for_interrupt_injection(), incorrectly treating
> > interrupts as being allowed could result in KVM prematurely exiting to
> > userspace to accept an ExtINT.
>
> "incorrectly treating interrupts as being allowed" is the status quo,
> that this patch fixes, not sth this patch introduces -- right?
>
> The changelog reads like for the non-injection case this change might
> not be the right thing to do, but I don't think this is the case? I
> assume returning false from
> kvm_vcpu_ready_for_interrupt_injection() and kvm_vcpu_has_events() if
> L1's interrupts are blocked while L2 is running is the right thing to
> do?
>
> The code makes sense to me but I am trying to make sense of the changelog.
>
> Aside from that, I have two comments about existing issues (Sashiko-style):
>
> 1. The return values of {vmx/svm}_interrupt_allowed() are annoying.
> IIUC, 0 is not allowed, 1 is allowed, and -EBUSY is generally allowed
> but not right now, request immediate exit and try again? That should
> be documented somewhere (maybe I just missed it?).

The next patch is adding the documentation, I spoke too soon :)

>
> 2. Aside from TDX, seems like {vmx/svm}_interrupt_allowed() are doing
> the same thing? So maybe move all that logic into
> kvm_arch_interrupt_allowed(), rename it (because it's only used by
> x86), and make interrupt_blocked() the actual per-vendor callback? I
> assume this is the case because nested_run_pending used to be
> per-vendor, but now we can clean this up. For TDX, I assume the
> per-vendor hook can just be tdx_interrupt_allowed() reversed?

It also does the renaming (nice!), so we're halfway there. I think for
this series, we should at least convert all direct calls to the vendor
.interrupt_allowed through the new kvm_is_interrupt_allowed() helper.
Unless you wanna go all the way and rework .interrupt_allowed to
.interrupt_blocked while at it as well.