Re: [PATCH] KVM: x86: ioapic: Fix level-triggered EOI and userspace IOAPIC reconfigure race

From: Sean Christopherson
Date: Wed Dec 07 2022 - 11:13:15 EST


On Wed, Dec 07, 2022, Adamos Ttofari wrote:
> When split-irqchip is used KVM uses ioapic_handled_vectors to identify
> which vectors require an exit to userspace IOAPIC. Unfortunately, when the
> IOAPIC is reconfigured while the interrupt is being handled, it will use
> the newest configuration; therefore, the EOI will not be delivered to
> IOAPIC.
>
> A previous commit 0fc5a36dd6b3

No need for "A previous", the fact that there's a stable commit hash means it
happened in the past.

> ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race")
> fixed the race for kernel ioapic, but the issue still persists for

s/ioapic/IOAPIC to be consistent, and because the I/O APIC is a proper thing.

> userspace IOAPIC:
>
> 1) Userspace IOAPIC sends a level triggered interrupt to VCPU0.
> 2) VCPU0's handler reconfigures the IOAPIC to route the interrupts to
> VCPU1. (This can cause userspace IOAPIC to commit a new routing table,
> eventually leading KVM to unset the vector in ioapic_handled_vectors)
> 3) VCPU0 triggers an EOI, and it's not delivered to userspace IOAPIC
> because the vector bit is not set in ioapic_handled_vectors.
> 4) The loss of EOI, leaves remote_irr in IOAPIC set. Eventually blocking
> new interrupts.
>
> To avoid the above scenario, we

Please avoid pronouns in shortlogs, changelogs, and comments, as pronouns are
often ambiguous. "it" and "they" are ok if the thing(s) being referred to is
a/the subject and was recently introduced/referenced, e.g. in the same sentence,
but "we", "us", "I", and "me" should never be used.

> should apply a similar fix like

State what the patch actually does, not what it "should" or "might" do.

> commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC
> reconfigure race") Which

Lack of punctuation, followed by random capitalization.

> is to add all pending and running vectors to ioapic_handled_vectors.

Please describe what this actually does (intercepts EOI for the relevant vectors).

For the changelog as a whole, some maintainers/subsystems prefer leading with the
"why", but I strongly prefer that changelogs state what the patch actually does
and then provide the background/justification.

Copy-pasting from a prior discussion[*]:

: To some extent, it's a personal preference, e.g. I
: find it easier to understand the details (why something is a problem) if I have
: the extra context of how a problem is fixed (or: what code was broken).
:
: But beyond personal preference, there are less subjective reasons for stating
: what a patch does before diving into details. First and foremost, what code is
: actually being changed is the most important information, and so that information
: should be easy to find. Changelogs that bury the "what's actually changing" in a
: one-liner after 3+ paragraphs of background make it very hard to find that information.
:
: Maybe for initial review one could argue that "what's the bug" is more important,
: but for skimming logs and git archeology, the gory details matter less and less.
: E.g. when doing a series of "git blame", the details of each change along the way
: are useless, the details only matter for the culprit; I just want to quickly
: determine whether or not a commit might be of interest.
:
: Another argument for stating "what's changing" first is that it's almost always
: possible to state "what's changing" in a single sentence. Conversely, all but the
: most simple bugs require multiple sentences or paragraphs to fully describe the
: problem. If both the "what's changing" and "what's the bug" are super short then
: the order doesn't matter. But if one is shorter (almost always the "what's changing),
: then covering the shorter one first is advantageous because it's less of an
: inconvenience for readers/reviewers that have a strict ordering preference. E.g.
: having to skip one sentence to get to the stuff you care about is less painful than
: me having to skip three paragraphs to get to the stuff that I care about.

And similar to the suggestion that was made in that discussion, the changelog for
this patch could be:

When scanning userspace IOAPIC entries, intercept EOI for level-triggered
IRQs if the current vCPU has a pending and/or in-service IRQ for the
vector in its local API, even if the vCPU doesn't match the new entry's
destination. This fixes a race between userspace IOAPIC reconfiguration
and IRQ delivery that results in the vector's bit being left set in the
remote IRR due to the eventual EOI not being forwarded to the userspace
IOAPIC.

Commit 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC
reconfigure race") fixed the in-kernel IOAPIC, but not the userspace
IOAPIC configuration, which has a similar race.

<wall of text>

[*] https://lore.kernel.org/all/YurKx+gFAWPvj35L@xxxxxxxxxx

> Fixes: 0fc5a36dd6b3 ("KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race")
>
> Signed-off-by: Adamos Ttofari <attofari@xxxxxxxxx>
> ---
> arch/x86/kvm/irq_comm.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 0687162c4f22..36d65997a212 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -426,8 +426,8 @@ void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
> kvm_set_msi_irq(vcpu->kvm, entry, &irq);
>
> if (irq.trig_mode &&
> - kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
> - irq.dest_id, irq.dest_mode))
> + (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT, irq.dest_id,
> + irq.dest_mode) || kvm_apic_pending_eoi(vcpu, irq.vector)))

This formatting, or lack thereof, is extremely difficult to read. It also
unnecessarily runs a fair bit over the 80 char soft limit.

if (irq.trig_mode &&
(kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT,
irq.dest_id, irq.dest_mode) ||
kvm_apic_pending_eoi(vcpu, irq.vector)))
__set_bit(irq.vector, ioapic_handled_vectors);