[PATCH v4] KVM: x86: ioapic: Optimize EOI handling to reduce unnecessary VM exits

From: weizijie
Date: Mon Mar 03 2025 - 00:28:20 EST


Configuring IOAPIC routed interrupts triggers KVM to rescan all vCPU's
ioapic_handled_vectors which is used to control which vectors need to
trigger EOI-induced VMEXITs. If any interrupt is already in service on
some vCPU using some vector when the IOAPIC is being rescanned, the
vector is set to vCPU's ioapic_handled_vectors. If the vector is then
reused by other interrupts, each of them will cause a VMEXIT even it is
unnecessary. W/o further IOAPIC rescan, the vector remains set, and this
issue persists, impacting guest's interrupt performance.

Both

commit db2bdcbbbd32 ("KVM: x86: fix edge EOI and IOAPIC reconfig race")

and

commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and
IOAPIC reconfigure race")

mentioned this issue, but it was considered as "rare" thus was not
addressed. However in real environment this issue can actually happen
in a well-behaved guest.

Simple Fix Proposal:
A straightforward solution is to record highest in-service IRQ that
is pending at the time of the last scan. Then, upon the next guest
exit, do a full KVM_REQ_SCAN_IOAPIC. This ensures that a re-scan of
the ioapic occurs only when the recorded vector is EOI'd, and
subsequently, the extra bits in the eoi_exit_bitmap are cleared,
avoiding unnecessary VM exits.

Co-developed-by: xuyun <xuyun_xy.xy@xxxxxxxxxxxxxxxxx>
Signed-off-by: xuyun <xuyun_xy.xy@xxxxxxxxxxxxxxxxx>
Signed-off-by: weizijie <zijie.wei@xxxxxxxxxxxxxxxxx>
Reviewed-by: Kai Huang <kai.huang@xxxxxxxxx>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/ioapic.c | 10 ++++++++--
arch/x86/kvm/irq_comm.c | 9 +++++++--
arch/x86/kvm/lapic.c | 13 +++++++++++++
4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0b7af5902ff7..8c50e7b4a96f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1062,6 +1062,7 @@ struct kvm_vcpu_arch {
#if IS_ENABLED(CONFIG_HYPERV)
hpa_t hv_root_tdp;
#endif
+ u8 last_pending_vector;
};

struct kvm_lpage_info {
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 995eb5054360..40252a800897 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -297,10 +297,16 @@ void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors)
u16 dm = kvm_lapic_irq_dest_mode(!!e->fields.dest_mode);

if (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
- e->fields.dest_id, dm) ||
- kvm_apic_pending_eoi(vcpu, e->fields.vector))
+ e->fields.dest_id, dm))
__set_bit(e->fields.vector,
ioapic_handled_vectors);
+ else if (kvm_apic_pending_eoi(vcpu, e->fields.vector)) {
+ __set_bit(e->fields.vector,
+ ioapic_handled_vectors);
+ vcpu->arch.last_pending_vector = e->fields.vector >
+ vcpu->arch.last_pending_vector ? e->fields.vector :
+ vcpu->arch.last_pending_vector;
+ }
}
}
spin_unlock(&ioapic->lock);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 8136695f7b96..1d23c52576e1 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -426,9 +426,14 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,

if (irq.trig_mode &&
(kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
- irq.dest_id, irq.dest_mode) ||
- kvm_apic_pending_eoi(vcpu, irq.vector)))
+ irq.dest_id, irq.dest_mode)))
__set_bit(irq.vector, ioapic_handled_vectors);
+ else if (kvm_apic_pending_eoi(vcpu, irq.vector)) {
+ __set_bit(irq.vector, ioapic_handled_vectors);
+ vcpu->arch.last_pending_vector = irq.vector >
+ vcpu->arch.last_pending_vector ? irq.vector :
+ vcpu->arch.last_pending_vector;
+ }
}
}
srcu_read_unlock(&kvm->irq_srcu, idx);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index a009c94c26c2..7c540a0eb340 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1466,6 +1466,19 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
if (!kvm_ioapic_handles_vector(apic, vector))
return;

+ /*
+ * When there are instances where ioapic_handled_vectors is
+ * set due to pending interrupts, clean up the record and do
+ * a full KVM_REQ_SCAN_IOAPIC.
+ * This ensures the vector is cleared in the vCPU's ioapic_handled_vectors,
+ * if the vector is reused by non-IOAPIC interrupts, avoiding unnecessary
+ * EOI-induced VMEXITs for that vector.
+ */
+ if (apic->vcpu->arch.last_pending_vector == vector) {
+ apic->vcpu->arch.last_pending_vector = 0;
+ kvm_make_request(KVM_REQ_SCAN_IOAPIC, apic->vcpu);
+ }
+
/* Request a KVM exit to inform the userspace IOAPIC. */
if (irqchip_split(apic->vcpu->kvm)) {
apic->vcpu->arch.pending_ioapic_eoi = vector;
--
2.43.5