Re: About patch bdedff263132 - KVM: x86: Route pending NMIs

From: Sean Christopherson
Date: Mon Oct 30 2023 - 11:10:54 EST


+KVM and LKML

https://people.kernel.org/tglx/notes-about-netiquette

On Mon, Oct 30, 2023, Prasad Pandit wrote:
> Hello Sean,
>
> Please see:
> -> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bdedff263132c862924f5cad96f0e82eeeb4e2e6
>
> * While testing a real-time host/guest setup, the above patch is
> causing a strange regression wherien guest boot delays by indefinite
> time. Sometimes it boots within a minute, sometimes it takes much
> longer. Maybe the guest VM is waiting for a NMI event.
>
> * Reverting the above patch helps to fix this issue. I'm wondering if
> a fix patch like below would be acceptable OR reverting above patch is
> more reasonable?

No, a revert would break AMD's vNMI.

> ===
> # cat ~test/rpmbuild/SOURCES/linux-kernel-test.patch
> +++ linux-5.14.0-372.el9/arch/x86/kvm/x86.c 2023-10-30
> 09:05:05.172815973 -0400
> @@ -5277,7 +5277,8 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_e
> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> vcpu->arch.nmi_pending = 0;
> atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
> - kvm_make_request(KVM_REQ_NMI, vcpu);
> + if (events->nmi.pending)
> + kvm_make_request(KVM_REQ_NMI, vcpu);

This looks sane, but it should be unnecessary as KVM_REQ_NMI nmi_queued=0 should
be a (costly) nop. Hrm, unless the vCPU is in HLT, in which case KVM will treat
a spurious KVM_REQ_NMI as a wake event. When I made this change, my assumption
was that userspace would set KVM_VCPUEVENT_VALID_NMI_PENDING iff there was
relevant information to process. But if I'm reading the code correctly, QEMU
invokes KVM_SET_VCPU_EVENTS with KVM_VCPUEVENT_VALID_NMI_PENDING at the end of
machine creation.

Hmm, but even that should be benign unless userspace is stuffing other guest
state. E.g. KVM will spuriously exit to userspace with -EAGAIN while the vCPU
is in KVM_MP_STATE_UNINITIALIZED, and I don't see a way for the vCPU to be put
into a blocking state after transitioning out of UNINITIATED via INIT+SIPI without
processing KVM_REQ_NMI.

> }
> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
> ===
>
> * Could you please have a look and suggest what could be a better fix?

Please provide more information on what is breaking and/or how to reproduce the
issue. E.g. at the very least, a trace of KVM_{G,S}ET_VCPU_EVENTS. There's not
even enough info here to write a changelog.