Re: [PATCH] KVM: x86: Ignore pending PV EOI if the vCPU has since disabled PV EOIs

From: Huang, Kai

Date: Mon Jun 29 2026 - 07:42:34 EST


On Fri, 2026-06-26 at 10:44 -0700, Sean Christopherson wrote:
> On Thu, Jun 25, 2026, Kai Huang wrote:
> > On Thu, 2026-06-25 at 08:33 -0700, Sean Christopherson wrote:
> > > On Thu, Jun 25, 2026, Kai Huang wrote:
> > > > On Wed, 2026-06-24 at 15:05 -0700, Sean Christopherson wrote:
> > I was kinda thinking whether it's possible that there are two IRQs when
> > vCPU.pv_eoi is active (e.g., one in IRR and one in ISR, with different vector),
> > but from the code right it's not possible:
> >
> > if (!pv_eoi_enabled(vcpu) ||
> > /* IRR set or many bits in ISR: could be nested. */
> > apic->irr_pending ||
> > ...)) {
> > return;
> > }
> > pv_eoi_set_pending(apic->vcpu);
> >
> > The reason behind this still eludes me :-(
>
> The PV EOI stuff is all about eliding the EOIs in the guest in order to avoid
> relatively useless VM-Exits (this pre-dates hardware virtualization of EOIs).
>
> Instead of having the guest explicitly do EOI, KVM sets two flags: one to note
> to itself that there is/was a pending PV EOI, and another that's shared with the
> vCPU to track whether or not the guest ack'd the pending PV EOI. On the next
> VM-Exit, KVM checks its internal pending PV EOI flag, and then does an actual
> EOI on behalf of the guest if the shared bit was cleared, i.e. if the PV EOI was
> ack'd by the guest.

Yeah.

>
> The irr_pending check above effectively disables PV EOI, because PV EOI only has
> a single bit, i.e. can only track a single IRQ, 
>

Right.

> and because KVM needs to know
> precisely when the pending IRQ is unblocked, i.e. can't lazily wait until the
> next VM-Exit.

Thus KVM needs to make sure guest will actually write EOI to APIC (thus PV EOI
must be disabled)?

>
> > The CPU cannot execute another vector in ISR until the highest one is EOI-ed,
> > right?
>
> Nope. The rules for delivery, i.e. for moving an IRQ from IRR => ISR, are that
> the IRQ's priority must be higher than PPR, the exact IRQ isn't in-progress (i.e.
> its ISR bit isn't set), and that IRQs aren't generally blocked by the core/CPU.
>
> From "12.8.4 Interrupt Acceptance for Fixed Interrupts"
>
> If the local APIC receives an interrupt with an interrupt-priority class higher
> than that of the interrupt currently in service, and interrupts are enabled in
> the processor core, the local APIC dispatches the higher priority interrupt to
> the processor immediately (without waiting for a write to the EOI register).

Ah I see. I should have checked this more carefully :-)

Thanks for all the info!