[PATCH v2 0/3] Fix 'Spurious APIC interrupt (vector 0xFF) on CPU#n' issue

From: Maxim Levitsky
Date: Wed Jul 26 2023 - 10:00:52 EST


Recently we found an issue which causes these error messages
to be sometimes logged if the guest has VFIO device attached:

'Spurious APIC interrupt (vector 0xFF) on CPU#0, should never happen'

It was traced to the incorrect APICv inhibition bug which started with
'KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC base'
(All these issues are now fixed)

However, there are valid cases for the APICv to be inhibited and it should not
cause spurious interrupts to be injected to the guest.

After some debug, the root cause was found and it is that __kvm_apic_update_irr
doesn't set irr_pending which later triggers a int->unsigned char conversion
bug which leads to the wrong 0xFF injection.

This also leads to an unbounded delay in injecting the interrupt and hurts
performance.

In addition to that, I also noticed that __kvm_apic_update_irr is not atomic
in regard to IRR, which can lead to an even harder to debug bug.

V2: applied Paolo's feedback for the patch 1.

Best regards,
Maxim Levitsky

Maxim Levitsky (3):
KVM: x86: VMX: __kvm_apic_update_irr must update the IRR atomically
KVM: x86: VMX: set irr_pending in kvm_apic_update_irr
KVM: x86: check the kvm_cpu_get_interrupt result before using it

arch/x86/kvm/lapic.c | 25 +++++++++++++++++--------
arch/x86/kvm/x86.c | 10 +++++++---
2 files changed, 24 insertions(+), 11 deletions(-)

--
2.26.3