Re: [PATCH] KVM: x86: Ignore pending PV EOI if the vCPU has since disabled PV EOIs
From: Sean Christopherson
Date: Fri Jun 26 2026 - 13:44:24 EST
On Thu, Jun 25, 2026, Kai Huang wrote:
> On Thu, 2026-06-25 at 08:33 -0700, Sean Christopherson wrote:
> > On Thu, Jun 25, 2026, Kai Huang wrote:
> > > On Wed, 2026-06-24 at 15:05 -0700, Sean Christopherson wrote:
> I was kinda thinking whether it's possible that there are two IRQs when
> vCPU.pv_eoi is active (e.g., one in IRR and one in ISR, with different vector),
> but from the code right it's not possible:
>
> if (!pv_eoi_enabled(vcpu) ||
> /* IRR set or many bits in ISR: could be nested. */
> apic->irr_pending ||
> ...)) {
> return;
> }
> pv_eoi_set_pending(apic->vcpu);
>
> The reason behind this still eludes me :-(
The PV EOI stuff is all about eliding the EOIs in the guest in order to avoid
relatively useless VM-Exits (this pre-dates hardware virtualization of EOIs).
Instead of having the guest explicitly do EOI, KVM sets two flags: one to note
to itself that there is/was a pending PV EOI, and another that's shared with the
vCPU to track whether or not the guest ack'd the pending PV EOI. On the next
VM-Exit, KVM checks its internal pending PV EOI flag, and then does an actual
EOI on behalf of the guest if the shared bit was cleared, i.e. if the PV EOI was
ack'd by the guest.
The irr_pending check above effectively disables PV EOI, because PV EOI only has
a single bit, i.e. can only track a single IRQ, and because KVM needs to know
precisely when the pending IRQ is unblocked, i.e. can't lazily wait until the
next VM-Exit.
> The CPU cannot execute another vector in ISR until the highest one is EOI-ed,
> right?
Nope. The rules for delivery, i.e. for moving an IRQ from IRR => ISR, are that
the IRQ's priority must be higher than PPR, the exact IRQ isn't in-progress (i.e.
its ISR bit isn't set), and that IRQs aren't generally blocked by the core/CPU.
>From "12.8.4 Interrupt Acceptance for Fixed Interrupts"
If the local APIC receives an interrupt with an interrupt-priority class higher
than that of the interrupt currently in service, and interrupts are enabled in
the processor core, the local APIC dispatches the higher priority interrupt to
the processor immediately (without waiting for a write to the EOI register).