Re: [PATCH 3/3] KVM: x86: Decouple APICv activation state from apicv_inhibit_reasons
From: Maxim Levitsky
Date: Mon Feb 03 2025 - 20:28:29 EST
On Mon, 2025-02-03 at 15:46 -0800, Sean Christopherson wrote:
> On Mon, Feb 03, 2025, Paolo Bonzini wrote:
> > On 2/3/25 19:45, Sean Christopherson wrote:
> > > Unless there's a very, very good reason to support a use case that generates
> > > ExtInts during boot, but _only_ during boot, and otherwise doesn't have any APICv
> > > ihibits, I'm leaning towards making SVM's IRQ window inhibit sticky, i.e. never
> > > clear it.
> >
> > BIOS tends to use PIT, so that may be too much. With respect to Naveen's report
> > of contention on apicv_update_lock, I would go with the sticky-bit idea but apply
> > it to APICV_INHIBIT_REASON_PIT_REINJ.
>
> That won't work, at least not with yet more changes, because KVM creates the
> in-kernel PIT with reinjection enabled by default. The stick-bit idea is that
> if a bit is set and can never be cleared, then there's no need to track new
> updates. Since userspace needs to explicitly disable reinjection, the inhibit
> can't be sticky.
I confirmed this with a trace, this is indeed the case.
>
> I assume We could fudge around that easily enough by deferring the inhibit until
> a vCPU is created (or run?), but piggybacking PIT_REINJ won't help the userspace
> I/O APIC case.
>
> > I don't love adding another inhibit reason but, together, these two should
> > remove the contention on apicv_update_lock. Another idea could be to move
> > IRQWIN to per-vCPU reason but Maxim tells me that it's not so easy.
I retract this statement, it was based on my knowledge from back when I implemented it.
Looking at the current code again, this should be possible and can be a nice cleanup regardless.
(Or I just might have forgotten the reason that made me think back then that this is not worth it,
because I do remember well that I wanted to make IRQWIN inhibit to be per vcpu)
Basically to do so we need to introduce per-vcpu inhibit field (instead of .vcpu_get_apicv_inhibit_reasons callback)
set the inhibit bit there in svm_enable_irq_window, and raise KVM_REQ_APICV_UPDATE.
Same thing in interrupt_window_interception().
Nested code can be updated to do so as well very easily. IMHO this is a very nice cleanup.
I'll prepare a patch soon for this.
Also regardless, I strongly support Paolo's idea to inhibit APICv/AVIC when more than one ExtINT entry is
enabled, although this might not be enough:
In fact multiple vCPUs with ExtINT enabled I think can trigger the WARN_ON_ONCE below:
kvm_cpu_has_extint will be true on both vCPUs, so kvm_cpu_has_injectable_intr will be true on both
vCPUs as well, and thus both vCPUs can try to pull the interrupt from PIC, with second one likely getting -1,
and that not to mention the possibility of corrupting PIC state due to the concurrent access.
I am talking about this code:
if (kvm_cpu_has_injectable_intr(vcpu)) {
r = can_inject ? kvm_x86_call(interrupt_allowed)(vcpu, true) :
-EBUSY;
if (r < 0)
goto out;
if (r) {
int irq = kvm_cpu_get_interrupt(vcpu);
if (!WARN_ON_ONCE(irq == -1)) {
kvm_queue_interrupt(vcpu, irq, false);
kvm_x86_call(inject_irq)(vcpu, false);
WARN_ON(kvm_x86_call(interrupt_allowed)(vcpu, true) < 0);
}
}
if (kvm_cpu_has_injectable_intr(vcpu))
kvm_x86_call(enable_irq_window)(vcpu);
}
So we might need to do something stronger than only inhibiting APICv/AVIC, we might
want to ignore second ExtINT entry, or maybe even better ignore both ExtInt entries and refuse to deliver ExtINT at all?
(the guest is broken (Intel says that this configuration is frowned upon), so IMHO it deserves to keep both pieces. Do you agree?)
Best regards,
Maxim Levitsky
>
> Oh, yeah, that reminds me of the other reason I would vote for a sticky flag:
> if inhibition really is toggling rapidly, performance is going to be quite bad
> because inhibiting APICv requires (a) zapping APIC SPTEs and (b) serializing
> writers if multiple vCPUs trigger the 0=>1 transition.
>
> And there's some amount of serialization even if there's only a single writer,
> as KVM kicks all vCPUs to toggle APICv (and again to flush TLBs, if necessary).
>
> Hmm, something doesn't add up. Naveen's changelog says:
>
> KVM additionally inhibits AVIC for requesting a IRQ window every time it has
> to inject external interrupts resulting in a barrage of inhibits being set and
> cleared. This shows significant performance degradation compared to AVIC being
> disabled, due to high contention on apicv_update_lock.
>
> But if this is a "real world" use case where the only source of ExtInt is the
> PIT, and kernels typically only wire up the PIT to the BSP, why is there
> contention on apicv_update_lock? APICv isn't actually being toggled, so readers
> blocking writers to handle KVM_REQ_APICV_UPDATE shouldn't be a problem.
>
> Naveen, do you know why there's a contention on apicv_update_lock? Are multiple
> vCPUs actually trying to inject ExtInt?
>