Re: [PATCH] KVM: x86: Clamp the EOI vector if its OOB instead of bugging the kernel
From: Huang, Kai
Date: Tue Jun 23 2026 - 06:30:49 EST
On Mon, 2026-06-22 at 16:55 -0700, Sean Christopherson wrote:
> On Fri, Jun 19, 2026, Kai Huang wrote:
> > On Thu, 2026-06-18 at 11:55 -0700, Sean Christopherson wrote:
> > > If KVM handles an I/O APIC EOI exit request with a bad vector, clamp the
> > > vector to 255 and hope for the best instead of bugging the host. In all
> > > likelihood, a missed EOI is survivable for the guest, and it's most
> > > definitely not remotely fatal to the host, i.e. potentially panicking the
> > > host is completely unjustified. Arbitrarily use 255 for the dummy vector,
> > > the goal is purely to ensure the vector is covered by the bitmap.
> >
> > 255 is a valid vector. How about use a CPU reserved one instead (e.g., vector
> > 0) and hope for the best?
>
> I was thinking it would be better to err on the side of spuriously exiting to
> userspace, versus suppressing an exit? And I wanted to keep the vector legal,
> in case something else in KVM cares about legal vectors? Hmm, but using 255 is
> bad because it likely never be cleared, and thus will block other EOI exits due
> to 255 being the highest priority vector.
>
> Ah, and the field is never explicitly initialized beyond the structutre being,
> so it's starting state is '0' as well. My only hesitation with zero is that in
> the unlikely case bit 0 is set in ioapic_handled_vectors, userspace will be extra
> confused.
>
I was actually thinking 0 may be less confusing than 255. A sane userspace
should just know 0 is a bad vector thus should report an error to admin, if not
kill the guest. On the other hand, 255 is a valid vector so userspace may
wrongly EOI an incorrect IRQ, which could be more confusing to the guest or
userspace itself in the end?
Btw, I think killing the guest should be acceptable if such bug happens? The
existing behaviour is to panic the host anyway ..
> But that's easy enough to deal with, just skip the check.
I am not sure ignoring the IOAPIC EOI exit (in case of this bug) is better than
reporting a invalid vector to userspace. I guess it's fine, since the worst
case is userspace loses the EOI for an IRQ AFAICT, but I am not sure this is
better?
>
> This?
>
> if (kvm_check_request(KVM_REQ_IOAPIC_EOI_EXIT, vcpu)) {
> if (WARN_ON_ONCE(vcpu->arch.pending_ioapic_eoi < 0 ||
> vcpu->arch.pending_ioapic_eoi > 255))
> vcpu->arch.pending_ioapic_eoi = 0;
> else if (test_bit(vcpu->arch.pending_ioapic_eoi,
> vcpu->arch.ioapic_handled_vectors)) {
> vcpu->run->exit_reason = KVM_EXIT_IOAPIC_EOI;
> vcpu->run->eoi.vector =
> vcpu->arch.pending_ioapic_eoi;
> r = 0;
> goto out;
> }
> }
Either way works for me. I am starting to think we care about this too much --
it's definitely better than BUG_ON() for sure :-)