Re: [PATCH 3/3] KVM: x86: Decouple APICv activation state from apicv_inhibit_reasons
From: Sean Christopherson
Date: Mon Feb 03 2025 - 13:45:38 EST
On Mon, Feb 03, 2025, Naveen N Rao (AMD) wrote:
> apicv_inhibit_reasons is used to determine if APICv is active, and if
> not, the reason(s) why it may be inhibited. In some scenarios, inhibit
> reasons can be set and cleared often, resulting in increased contention
> on apicv_update_lock used to guard updates to apicv_inhibit_reasons.
>
> In particular, if a guest is using PIT in reinject mode (the default)
> and if AVIC is enabled in kvm_amd kernel module, we inhibit AVIC during
> kernel PIT creation (APICV_INHIBIT_REASON_PIT_REINJ), resulting in KVM
> emulating x2APIC for the guest. In that case, since AVIC is enabled in
> the kvm_amd kernel module, KVM additionally inhibits AVIC for requesting
> a IRQ window every time it has to inject external interrupts resulting
> in a barrage of inhibits being set and cleared. This shows significant
> performance degradation compared to AVIC being disabled, due to high
> contention on apicv_update_lock.
>
> Though apicv_update_lock is being used to guard updates to
> apicv_inhibit_reasons, it is only necessary if the APICv activation
> state changes. Introduce a separate boolean, apicv_activated, to track
> if APICv is active or not, and limit use of apicv_update_lock for when
> APICv is being (de)activated. Convert apicv_inhibit_reasons to an atomic
> and use atomic operations to fetch/update it.
It's a moot point, but there is too much going on in this patch. For a change
like this, it should be split into at ~4 patches:
1. Add an kvm_x86_ops.required_apicv_inhibits check outside outside of the lock,
which we can and should do anyways. I would probably vote to turn the one
inside the lock into a WARN, as that "optimized" path is only an optimization
if the inhibit applies to both SVM and VMX.
2. Use a bool to track the activated state.
3. Use an atomic.
4. Implement the optimized updates.
#3 is optional, #1 and #2 are not.
It's a moot point though, because toggling APICv inhibits on IRQ windows is
laughably broken. I'm guessing it "works" because the only time it triggers in
practice is in conjunction with PIT re-injection.
1. vCPU0 waits for an IRQ window, APICV_INHIBIT_REASON_IRQWIN = 1
2. vCPU1 waits for an IRQ window, APICV_INHIBIT_REASON_IRQWIN = 1
3. vCPU1 gets its IRQ window, APICV_INHIBIT_REASON_IRQWIN = 0
4. vCPU0 is sad
APICV_INHIBIT_REASON_BLOCKIRQ is also tied to per-vCPU state, but is a-ok because
KVM walks all vCPUs to generate inhibit. That approach won't work for IRQ windows
though, because it's obviously not a slow path.
What is generating so many external interrupts? E.g. is the guest using the PIT
for TSC or APIC calibration? I suppose it doesn't matter terribly in this case
since APICV_INHIBIT_REASON_PIT_REINJ is already set.
Unless there's a very, very good reason to support a use case that generates
ExtInts during boot, but _only_ during boot, and otherwise doesn't have any APICv
ihibits, I'm leaning towards making SVM's IRQ window inhibit sticky, i.e. never
clear it. And then we can more simply optimize kvm_set_or_clear_apicv_inhibit()
to do nothing if the inhibit is sticky and already set.
E.g. something like this:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6d4a6734b2d6..6f926fd6fc1c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10629,6 +10629,14 @@ void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
if (!enable_apicv)
return;
+ if (kvm_is_apicv_inhibit_sticky(reason)) {
+ if (WARN_ON_ONCE(!set))
+ return;
+
+ if (kvm_test_apicv_inhibit(reason))
+ return;
+ }
+
down_write(&kvm->arch.apicv_update_lock);
__kvm_set_or_clear_apicv_inhibit(kvm, reason, set);
up_write(&kvm->arch.apicv_update_lock);