Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding
From: Dmytro Maluka
Date: Tue Aug 09 2022 - 19:30:40 EST
On 8/9/22 10:01 PM, Dong, Eddie wrote:
>
>
>> -----Original Message-----
>> From: Dmytro Maluka <dmy@xxxxxxxxxxxx>
>> Sent: Tuesday, August 9, 2022 12:24 AM
>> To: Dong, Eddie <eddie.dong@xxxxxxxxx>; Christopherson,, Sean
>> <seanjc@xxxxxxxxxx>; Paolo Bonzini <pbonzini@xxxxxxxxxx>;
>> kvm@xxxxxxxxxxxxxxx
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>; Ingo Molnar <mingo@xxxxxxxxxx>;
>> Borislav Petkov <bp@xxxxxxxxx>; Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>;
>> x86@xxxxxxxxxx; H. Peter Anvin <hpa@xxxxxxxxx>; linux-
>> kernel@xxxxxxxxxxxxxxx; Eric Auger <eric.auger@xxxxxxxxxx>; Alex
>> Williamson <alex.williamson@xxxxxxxxxx>; Liu, Rong L <rong.l.liu@xxxxxxxxx>;
>> Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx>; Tomasz Nowicki
>> <tn@xxxxxxxxxxxx>; Grzegorz Jaszczyk <jaz@xxxxxxxxxxxx>;
>> upstream@xxxxxxxxxxxx; Dmitry Torokhov <dtor@xxxxxxxxxx>
>> Subject: Re: [PATCH v2 0/5] KVM: Fix oneshot interrupts forwarding
>>
>> On 8/9/22 1:26 AM, Dong, Eddie wrote:
>>>>
>>>> The existing KVM mechanism for forwarding of level-triggered
>>>> interrupts using resample eventfd doesn't work quite correctly in the
>>>> case of interrupts that are handled in a Linux guest as oneshot
>>>> interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device
>>>> in its threaded irq handler, i.e. later than it is acked to the
>>>> interrupt controller (EOI at the end of hardirq), not earlier. The
>>>> existing KVM code doesn't take that into account, which results in
>>>> erroneous extra interrupts in the guest caused by premature re-assert of an
>> unacknowledged IRQ by the host.
>>>
>>> Interesting... How it behaviors in native side?
>>
>> In native it behaves correctly, since Linux masks such a oneshot interrupt at the
>> beginning of hardirq, so that the EOI at the end of hardirq doesn't result in its
>> immediate re-assert, and then unmasks it later, after its threaded irq handler
>> completes.
>>
>> In handle_fasteoi_irq():
>>
>> if (desc->istate & IRQS_ONESHOT)
>> mask_irq(desc);
>>
>> handle_irq_event(desc);
>>
>> cond_unmask_eoi_irq(desc, chip);
>>
>>
>> and later in unmask_threaded_irq():
>>
>> unmask_irq(desc);
>>
>> I also mentioned that in patch #3 description:
>> "Linux keeps such interrupt masked until its threaded handler finishes, to
>> prevent the EOI from re-asserting an unacknowledged interrupt.
>
> That makes sense. Can you include the full story in cover letter too?
Ok, I will.
>
>
>> However, with KVM + vfio (or whatever is listening on the resamplefd) we don't
>> check that the interrupt is still masked in the guest at the moment of EOI.
>> Resamplefd is notified regardless, so vfio prematurely unmasks the host
>> physical IRQ, thus a new (unwanted) physical interrupt is generated in the host
>> and queued for injection to the guest."
>>
>
> Emulation of level triggered IRQ is a pain point ☹
> I read we need to emulate the "level" of the IRQ pin (connecting from device to IRQchip, i.e. ioapic here).
> Technically, the guest can change the polarity of vIOAPIC, which will lead to a new virtual IRQ
> even w/o host side interrupt.
Thanks, interesting point. Do you mean that this behavior (a new vIRQ as
a result of polarity change) may already happen with the existing KVM code?
It doesn't seem so to me. AFAICT, KVM completely ignores the vIOAPIC
polarity bit, in particular it doesn't handle change of the polarity by
the guest (i.e. doesn't update the virtual IRR register, and so on), so
it shouldn't result in a new interrupt.
Since commit 100943c54e09 ("kvm: x86: ignore ioapic polarity") there
seems to be an assumption that KVM interpretes the IRQ level value as
active (asserted) vs inactive (deasserted) rather than high vs low, i.e.
the polarity doesn't matter to KVM.
So, since both sides (KVM emulating the IOAPIC, and vfio/whatever
emulating an external interrupt source) seem to operate on a level of
abstraction of "asserted" vs "de-asserted" interrupt state regardless of
the polarity, and that seems not a bug but a feature, it seems that we
don't need to emulate the IRQ level as such. Or am I missing something?
OTOH, I guess this means that the existing KVM's emulation of
level-triggered interrupts is somewhat limited (a guest may legitimately
expect an interrupt fired as a result of polarity change, and that case
is not supported by KVM). But that is rather out of scope of the oneshot
interrupts issue addressed by this patchset.
> "pending" field of kvm_kernel_irqfd_resampler in patch 3 means more an event rather than an interrupt level.
>
>
>>>
>>>>
>>>> This patch series fixes this issue (for now on x86 only) by checking
>>>> if the interrupt is unmasked when we receive irq ack (EOI) and, in
>>>> case if it's masked, postponing resamplefd notify until the guest unmasks it.
>>>>
>>>> Patches 1 and 2 extend the existing support for irq mask notifiers in
>>>> KVM, which is a prerequisite needed for KVM irqfd to use mask
>>>> notifiers to know when an interrupt is masked or unmasked.
>>>>
>>>> Patch 3 implements the actual fix: postponing resamplefd notify in
>>>> irqfd until the irq is unmasked.
>>>>
>>>> Patches 4 and 5 just do some optional renaming for consistency, as we
>>>> are now using irq mask notifiers in irqfd along with irq ack notifiers.
>>>>
>>>> Please see individual patches for more details.
>>>>
>>>> v2:
>>>> - Fixed compilation failure on non-x86: mask_notifier_list moved from
>>>> x86 "struct kvm_arch" to generic "struct kvm".
>>>> - kvm_fire_mask_notifiers() also moved from x86 to generic code, even
>>>> though it is not called on other architectures for now.
>>>> - Instead of kvm_irq_is_masked() implemented
>>>> kvm_register_and_fire_irq_mask_notifier() to fix potential race
>>>> when reading the initial IRQ mask state.
>>>> - Renamed for clarity:
>>>> - irqfd_resampler_mask() -> irqfd_resampler_mask_notify()
>>>> - kvm_irq_has_notifier() -> kvm_irq_has_ack_notifier()
>>>> - resampler->notifier -> resampler->ack_notifier
>>>> - Reorganized code in irqfd_resampler_ack() and
>>>> irqfd_resampler_mask_notify() to make it easier to follow.
>>>> - Don't follow unwanted "return type on separate line" style for
>>>> irqfd_resampler_mask_notify().
>>>>
>>>> Dmytro Maluka (5):
>>>> KVM: x86: Move irq mask notifiers from x86 to generic KVM
>>>> KVM: x86: Add kvm_register_and_fire_irq_mask_notifier()
>>>> KVM: irqfd: Postpone resamplefd notify for oneshot interrupts
>>>> KVM: irqfd: Rename resampler->notifier
>>>> KVM: Rename kvm_irq_has_notifier()
>>>>
>>>> arch/x86/include/asm/kvm_host.h | 17 +---
>>>> arch/x86/kvm/i8259.c | 6 ++
>>>> arch/x86/kvm/ioapic.c | 8 +-
>>>> arch/x86/kvm/ioapic.h | 1 +
>>>> arch/x86/kvm/irq_comm.c | 74 +++++++++++------
>>>> arch/x86/kvm/x86.c | 1 -
>>>> include/linux/kvm_host.h | 21 ++++-
>>>> include/linux/kvm_irqfd.h | 16 +++-
>>>> virt/kvm/eventfd.c | 136 ++++++++++++++++++++++++++++----
>>>> virt/kvm/kvm_main.c | 1 +
>>>> 10 files changed, 221 insertions(+), 60 deletions(-)
>>>>
>>>> --
>>>> 2.37.1.559.g78731f0fdb-goog
>>>