Re: [PATCH 3/3] KVM: x86: Decouple APICv activation state from apicv_inhibit_reasons

From: Paolo Bonzini
Date: Tue Feb 04 2025 - 09:11:50 EST


On Tue, Feb 4, 2025 at 12:15 PM Naveen N Rao <naveen@xxxxxxxxxx> wrote:
> As a separate change, I have been testing a patch that moves the
> PIT_REINJ inhibit from PIT creation to the point at which the guest
> actually programs it so that default guest configurations can utilize
> AVIC:

In-kernel PIC and PIT is sort of a legacy setup; the so-called
"split irqchip" (LAPIC in KVM, PIC/PIT in userspace) is strongly
preferred. So I don't think it's particularly important to cater
for PIT_REINJ.

> If it is, or if we choose to delay PIT_REINJ inhibit to vcpu creation time,
> then making PT_REINJ or IRQWIN inhibits sticky will prevent AVIC from being
> enabled later on. I can see in my tests that BIOS (both seabios and edk2)
> programs the PIT though Linux guest itself doesn't (unless -no-hpet is used).

Even with -no-hpet, Linux should turn off the PIT relatively soon
and only rely on the local APIC's timer.

> You're right -- APICv isn't actually being toggled, but IRQWIN inhibit is
> constantly being set and cleared while trying to inject device interrupts into
> the guests. The places where we set/clear IRQWIN inhibit has comments
> indicating that it is only required for ExtINT, though that's not actually the
> case here.
>
> What is actually happening is that since the PIT is in reinject mode, APICv is
> not active in the guest. When that happens, kvm_cpu_has_injectable_intr()
> returns true when any interrupt is pending:
>
> /*
> * check if there is injectable interrupt:
> * when virtual interrupt delivery enabled,
> * interrupt from apic will handled by hardware,
> * we don't need to check it here.
> */
> int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
> {
> if (kvm_cpu_has_extint(v))
> return 1;
>
> if (!is_guest_mode(v) && kvm_vcpu_apicv_active(v))
> return 0;
>
> return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
> }
>
> The second if condition fails since APICv is not active. So,
> kvm_check_and_inject_events() calls enable_irq_window() to request for an IRQ
> window to inject those interrupts.

Ok, that's due solely to the presence of *another* active inhibit.
Since sticky inhibits cannot work, making the IRQWIN inhibit per-CPU
will still cause vCPUs to pound on the apicv_update_lock, but only on
the read side of the rwsem so that should be more tolerable.

Using atomics is considerably more complicated and I'd rather avoid it.

Paolo