Re: [PATCH 1/6] KVM: nVMX: Get to-be-acknowledge IRQ for nested VM-Exit at injection site

From: Sean Christopherson
Date: Wed Sep 04 2024 - 20:38:04 EST


On Wed, Sep 04, 2024, Sean Christopherson wrote:
> On Wed, Sep 04, 2024, Nathan Chancellor wrote:
> > I bisected (log below) an issue with starting a nested guest that
> > appears on two of my newer Intel test machines (but not a somewhat old
> > laptop) when this change as commit 6f373f4d941b ("KVM: nVMX: Get
> > to-be-acknowledge IRQ for nested VM-Exit at injection site") in -next is
> > present in the host kernel.
> >
> > I start a virtual machine with a full distribution using QEMU then start
> > a nested virtual machine using QEMU with the same kernel and a much
> > simpler Buildroot initrd, just to test the ability to run a nested
> > guest. After this change, starting a nested guest results in no output
> > from the nested guest and eventually the first guest restarts, sometimes
> > printing a lockup message that appears to be caused from qemu-system-x86
>
> *sigh*
>
> It's not you, it's me.
>
> I just bisected hangs in my nested setup to this same commit. Apparently, I
> completely and utterly failed at testing.
>
> There isn't that much going on here, so knock wood, getting a root cause shouldn't
> be terribly difficult.

Well fudge. My attempt to avoid splitting kvm_get_apic_interrupt() and exposing
more lapic.c internals to nested VMX failed spectaculary.

Hiding down in apic_set_isr() is a call to hwapic_isr_update(), which updates
vmcs.GUEST_INTERRUPT_STATUS.SVI to mirror the highest vector in the virtual APIC's
ISR. On a nested VM-Exit due to a IRQ, that update is supposed to hit vmcs01.
By moving the call to kvm_get_apic_interrupt() out of nested_vmx_vmexit(), that
update hits vmcs02 instead, and things go downhill from there.

The obvious/easy solution is to split kvm_get_apic_interrupt() so that nVMX can
find an interrupt, emulate nested VM-Exit or posted interrupt processing as
appropriate, and _then_ ACK the IRQ (if a VM-Exit was synthesized). It's not
really any harder than what I did here, as above I just didn't want to split
kvm_get_apic_interrupt(). But I don't see any sane alternative, and in the end
it's not any worse than plumbing the notification vector into kvm_get_apic_interrupt();
either way, we're bleeding implementation details between common x86 code and
nVMX.

Luckily, this series is sitting at the top of `kvm-x86 vmx` (yay, topic branches!),
so I'll just drop the entire series and post a full v2. Unless I botched this
new version too (haven't tested yet), I should get v2 posted tomorrow.

Sorry for pushing garbage, this should never have been posted, let alone gotten
applied to -next.