Re: [PATCH] KVM/VMX: Invoke NMI non-IST entry instead of IST entry

From: Lai Jiangshan
Date: Wed May 05 2021 - 11:44:58 EST




On 2021/5/5 08:00, Thomas Gleixner wrote:
On Tue, May 04 2021 at 23:56, Paolo Bonzini wrote:
On 04/05/21 23:51, Sean Christopherson wrote:
On Tue, May 04, 2021, Paolo Bonzini wrote:
On 04/05/21 23:23, Andy Lutomirski wrote:
On May 4, 2021, at 2:21 PM, Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
FWIW, NMIs are masked if the VM-Exit was due to an NMI.

Huh, indeed: "An NMI causes subsequent NMIs to be blocked, but only after
the VM exit completes".

Then this whole change is busted, since nothing will unmask NMIs. Revert it?
Looks like the easiest way out indeed.

I've no objection to reverting to intn, but what does reverting versus handling
NMI on the kernel stack have to do with NMIs being blocked on VM-Exit due to NMI?
I'm struggling mightily to connect the dots.

Nah, you're right: vmx_do_interrupt_nmi_irqoff will not call the handler
directly, rather it calls the IDT entrypoint which *will* do an IRET and
unmask NMIs. I trusted Andy too much on this one. :)

Thomas's posted patch ("[PATCH] KVM/VMX: Invoke NMI non-IST entry
instead of IST entry") looks good.

Well, looks good is one thing.

It would be more helpful if someone would actually review and/or test it.

Thanks,

tglx


I tested it with the following testing-patch applied, it shows that the
problem is fixed.

The only one line of code in vmenter.S in the testing-patch just emulates
the situation that a "uninitialized" garbage in the kernel stack happens
to be 1 and it happens to be at the same location of the RSP-located
"NMI executing" variable.


First round:
# apply the testing-patch
# perf record events of a vm which does kbuild inside
# dmesg shows that there are the same number of "kvm nmi" and "kvm nmi miss"
It shows that the problem exists with regard to the invocation of the NMI
handler.

Second Round:
# apply the fix from tglx
# apply the testing-patch
# perf record events of a vm which does kbuild inside
# dmesg shows that there are some "kvm nmi" but no "kvm nmi miss".
It shows that the problem is fixed.


diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 3a6461694fc2..32096049c2a2 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -316,6 +316,7 @@ SYM_FUNC_START(vmx_do_interrupt_nmi_irqoff)
#endif
pushf
push $__KERNEL_CS
+ movq $1, -24(%rsp) // "NMI executing": 1 = nested, non-1 = not-nested
CALL_NOSPEC _ASM_ARG1

/*
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 8586eca349a9..eefd22d22fce 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6439,8 +6439,17 @@ static void vmx_handle_exit_irqoff(struct kvm_vcpu *vcpu)

if (vmx->exit_reason.basic == EXIT_REASON_EXTERNAL_INTERRUPT)
handle_external_interrupt_irqoff(vcpu);
- else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI)
+ else if (vmx->exit_reason.basic == EXIT_REASON_EXCEPTION_NMI) {
+ unsigned long count = this_cpu_read(irq_stat.__nmi_count);
+
handle_exception_nmi_irqoff(vmx);
+
+ if (is_nmi(vmx_get_intr_info(&vmx->vcpu))) {
+ pr_info("kvm nmi\n");
+ if (count == this_cpu_read(irq_stat.__nmi_count))
+ pr_info("kvm nmi miss\n");
+ }
+ }
}

/*