RE: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding
From: Dong, Eddie
Date: Tue Aug 09 2022 - 16:03:13 EST
> -----Original Message-----
> From: Dmytro Maluka <dmy@xxxxxxxxxxxx>
> Sent: Tuesday, August 9, 2022 12:24 AM
> To: Dong, Eddie <eddie.dong@xxxxxxxxx>; Christopherson,, Sean
> <seanjc@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>;
> kvm@xxxxxxxxxxxxxxx
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>;
> Borislav Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>;
> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux-
> kernel@xxxxxxxxxxxxxxx; Eric Auger <eric.auger@xxxxxxxxxx>; Alex
> Williamson <alex.williamson@xxxxxxxxxx>; Liu, Rong L <rong.l.liu@xxxxxxxxx>;
> Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>; Tomasz Nowicki
> <tn@xxxxxxxxxxxx>; Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>;
> upstream@xxxxxxxxxxxx; Dmitry Torokhov <dtor@xxxxxxxxxx>
> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding
>
> On 8/9/22 1:26 AM, Dong, Eddie wrote:
> >>
> >> The existing KVM mechanism for forwarding of level-triggered
> >> interrupts using resample eventfd doesn't work quite correctly in the
> >> case of interrupts that are handled in a Linux guest as oneshot
> >> interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device
> >> in its threaded irq handler, i.e. later than it is acked to the
> >> interrupt controller (EOI at the end of hardirq), not earlier. The
> >> existing KVM code doesn't take that into account, which results in
> >> erroneous extra interrupts in the guest caused by premature re-assert of an
> unacknowledged IRQ by the host.
> >
> > Interesting... How it behaviors in native side?
>
> In native it behaves correctly, since Linux masks such a oneshot interrupt at the
> beginning of hardirq, so that the EOI at the end of hardirq doesn't result in its
> immediate re-assert, and then unmasks it later, after its threaded irq handler
> completes.
>
> In handle_fasteoi_irq():
>
> if (desc->istate & IRQS_ONESHOT)
> mask_irq(desc);
>
> handle_irq_event(desc);
>
> cond_unmask_eoi_irq(desc, chip);
>
>
> and later in unmask_threaded_irq():
>
> unmask_irq(desc);
>
> I also mentioned that in patch #3 description:
> "Linux keeps such interrupt masked until its threaded handler finishes, to
> prevent the EOI from re-asserting an unacknowledged interrupt.
That makes sense. Can you include the full story in cover letter too?
> However, with KVM + vfio (or whatever is listening on the resamplefd) we don't
> check that the interrupt is still masked in the guest at the moment of EOI.
> Resamplefd is notified regardless, so vfio prematurely unmasks the host
> physical IRQ, thus a new (unwanted) physical interrupt is generated in the host
> and queued for injection to the guest."
>
Emulation of level triggered IRQ is a pain point ☹
I read we need to emulate the "level" of the IRQ pin (connecting from device to IRQchip, i.e. ioapic here).
Technically, the guest can change the polarity of vIOAPIC, which will lead to a new virtual IRQ
even w/o host side interrupt.
"pending" field of kvm_kernel_irqfd_resampler in patch 3 means more an event rather than an interrupt level.
> >
> >>
> >> This patch series fixes this issue (for now on x86 only) by checking
> >> if the interrupt is unmasked when we receive irq ack (EOI) and, in
> >> case if it's masked, postponing resamplefd notify until the guest unmasks it.
> >>
> >> Patches 1 and 2 extend the existing support for irq mask notifiers in
> >> KVM, which is a prerequisite needed for KVM irqfd to use mask
> >> notifiers to know when an interrupt is masked or unmasked.
> >>
> >> Patch 3 implements the actual fix: postponing resamplefd notify in
> >> irqfd until the irq is unmasked.
> >>
> >> Patches 4 and 5 just do some optional renaming for consistency, as we
> >> are now using irq mask notifiers in irqfd along with irq ack notifiers.
> >>
> >> Please see individual patches for more details.
> >>
> >> v2:
> >> - Fixed compilation failure on non-x86: mask_notifier_list moved from
> >> x86 "struct kvm_arch" to generic "struct kvm".
> >> - kvm_fire_mask_notifiers() also moved from x86 to generic code, even
> >> though it is not called on other architectures for now.
> >> - Instead of kvm_irq_is_masked() implemented
> >> kvm_register_and_fire_irq_mask_notifier() to fix potential race
> >> when reading the initial IRQ mask state.
> >> - Renamed for clarity:
> >> - irqfd_resampler_mask() -> irqfd_resampler_mask_notify()
> >> - kvm_irq_has_notifier() -> kvm_irq_has_ack_notifier()
> >> - resampler->notifier -> resampler->ack_notifier
> >> - Reorganized code in irqfd_resampler_ack() and
> >> irqfd_resampler_mask_notify() to make it easier to follow.
> >> - Don't follow unwanted "return type on separate line" style for
> >> irqfd_resampler_mask_notify().
> >>
> >> Dmytro Maluka (5):
> >> KVM: x86: Move irq mask notifiers from x86 to generic KVM
> >> KVM: x86: Add kvm_register_and_fire_irq_mask_notifier()
> >> KVM: irqfd: Postpone resamplefd notify for oneshot interrupts
> >> KVM: irqfd: Rename resampler->notifier
> >> KVM: Rename kvm_irq_has_notifier()
> >>
> >> arch/x86/include/asm/kvm_host.h | 17 +---
> >> arch/x86/kvm/i8259.c | 6 ++
> >> arch/x86/kvm/ioapic.c | 8 +-
> >> arch/x86/kvm/ioapic.h | 1 +
> >> arch/x86/kvm/irq_comm.c | 74 +++++++++++------
> >> arch/x86/kvm/x86.c | 1 -
> >> include/linux/kvm_host.h | 21 ++++-
> >> include/linux/kvm_irqfd.h | 16 +++-
> >> virt/kvm/eventfd.c | 136 ++++++++++++++++++++++++++++----
> >> virt/kvm/kvm_main.c | 1 +
> >> 10 files changed, 221 insertions(+), 60 deletions(-)
> >>
> >> --
> >> 2.37.1.559.g78731f0fdb-goog
> >