On 02/05/20 11:24, Suravee Suthikulpanit wrote:
....
The questions to answer are: what is causing the re-entrancy? and why
is dropping the second EOI update safe?
The answer to the latter could well be "because we've already processed
it", but the answer to the former is more important.
The re-entrancy happens because the irq state is the OR of
the interrupt state and the resamplefd state. That is, we don't
want to show the state as 0 until we've had a chance to set the
resamplefd. But if the interrupt has _not_ gone low then we get an
infinite loop.
So the actual root cause is that this is a level-triggered interrupt,
otherwise irqfd_inject would immediately set the KVM_USERSPACE_IRQ_SOURCE_ID
high and then low and you wouldn't have the infinite loop.
But in the case of level-triggered interrupts the VMEXIT already happens because
TMR is set; only edge-triggered interrupts need the lazy invocation
of the ack notifier.
So this should be the fix:
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 7668fed1ce65..ca2d73cd00a3 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -225,12 +225,12 @@ static int ioapic_set_irq(struct kvm_ioapic *ioapic, unsigned int irq,
}
/*
- * AMD SVM AVIC accelerate EOI write and do not trap,
- * in-kernel IOAPIC will not be able to receive the EOI.
+ * AMD SVM AVIC accelerate EOI write iff the interrupt is level
+ * triggered, in-kernel IOAPIC will not be able to receive the EOI.
* In this case, we do lazy update of the pending EOI when
* trying to set IOAPIC irq.
*/
- if (kvm_apicv_activated(ioapic->kvm))
+ if (edge && kvm_apicv_activated(ioapic->kvm))
ioapic_lazy_update_eoi(ioapic, irq);
/*
Did I miss anything in the above analysis with respect to AVIC?