Re: [PATCH v4 26/30] KVM: x86: Don't treat interrupts as allowed just because a nested run is pending
From: Sean Christopherson
Date: Mon Jun 15 2026 - 13:13:49 EST
On Mon, Jun 15, 2026, Yosry Ahmed wrote:
> On Mon, Jun 15, 2026 at 9:40 AM Yosry Ahmed <yosry@xxxxxxxxxx> wrote:
> >
> > On Fri, Jun 12, 2026 at 5:04 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > >
> > > When querying whether or not interrupts (IRQs) are allowed, check for a
> > > pending nested run _after_ checking whether or not interrupts are blocked.
> > > If L1 is running L2 _without_ nested_exit_on_intr(), i.e. if L1 IRQs can
> > > be blocked while running L2, and interrupts will indeed be blocked once the
> > > nested VM-Enter to L2 is completed, then KVM should treat interrupts as not
> > > being allowed.
> > >
> > > For injection, this avoids an unnecessary (forced) VM-Exit, as KVM can
> > > immediately request an IRQ window, instead of forcing an exit and _then_
> > > requesting an IRQ window (because after the forced exit, KVM will see that
> > > interrupts are blocked).
> > >
> > > For non-injection usage, only kvm_vcpu_ready_for_interrupt_injection() is
> > > affected in practice. kvm_vcpu_has_events() is unreachable when a nested
> > > run is pending, as KVM clears nested_run_pending prior to calling
> > > kvm_emulate_halt_noskip() when putting L2 into HLT via GUEST_ACTIVITY_HLT,
> > > and SVM has no equivalent to GUEST_ACTIVITY_STATE. I.e. the vCPU will
> > > always be runnable if a nested run is pending, and thus
> > > kvm_arch_vcpu_runnable() => kvm_vcpu_has_events() is effectively dead code,
> > > as is __kvm_emulate_halt() => kvm_vcpu_has_events(). Oh, and TDX doesn't
> > > support nested VMX. Similarly, kvm_can_do_async_pf() is unreachable as
> > > KVM shouldn't be faulting in memory with a pending nested VM-Enter.
> > >
> > > As for kvm_vcpu_ready_for_interrupt_injection(), incorrectly treating
> > > interrupts as being allowed could result in KVM prematurely exiting to
> > > userspace to accept an ExtINT.
> >
> > "incorrectly treating interrupts as being allowed" is the status quo,
> > that this patch fixes, not sth this patch introduces -- right?
> >
> > The changelog reads like for the non-injection case this change might
> > not be the right thing to do, but I don't think this is the case? I
> > assume returning false from
> > kvm_vcpu_ready_for_interrupt_injection() and kvm_vcpu_has_events() if
> > L1's interrupts are blocked while L2 is running is the right thing to
> > do?
> >
> > The code makes sense to me but I am trying to make sense of the changelog.
> >
> > Aside from that, I have two comments about existing issues (Sashiko-style):
> >
> > 1. The return values of {vmx/svm}_interrupt_allowed() are annoying.
> > IIUC, 0 is not allowed, 1 is allowed, and -EBUSY is generally allowed
> > but not right now, request immediate exit and try again? That should
> > be documented somewhere (maybe I just missed it?).
>
> The next patch is adding the documentation, I spoke too soon :)
>
> >
> > 2. Aside from TDX, seems like {vmx/svm}_interrupt_allowed() are doing
> > the same thing? So maybe move all that logic into
> > kvm_arch_interrupt_allowed(), rename it (because it's only used by
> > x86), and make interrupt_blocked() the actual per-vendor callback?
Doesn't work, at least not with more changes/hooks, because nested_exit_on_intr()
is subtly vendor specifc. The concept is identical, but the implementation needs
to query vmc{b,s}12. :-/
I'm not opposed to figuring out a way to move the logic to common x86, because
I agree that's where it would ideally live, but I don't want to tack that in this
series.
> > I assume this is the case because nested_run_pending used to be per-vendor,
> > but now we can clean this up. For TDX, I assume the per-vendor hook can
> > just be tdx_interrupt_allowed() reversed?
>
> It also does the renaming (nice!), so we're halfway there. I think for
> this series, we should at least convert all direct calls to the vendor
> .interrupt_allowed through the new kvm_is_interrupt_allowed() helper.
Sadly not in this series, because handling the for_injection=true case requires
moving the nested_run_pending logic to common x86, and without that, we end up
with two very silly wrappers.
> Unless you wanna go all the way and rework .interrupt_allowed to
> .interrupt_blocked while at it as well.