Re: [PATCH v2 07/11] KVM: x86: add a delayed hardware NMI injection interface

From: Santosh Shukla
Date: Tue Feb 14 2023 - 05:23:06 EST




On 2/8/2023 9:36 PM, Sean Christopherson wrote:
> On Wed, Feb 08, 2023, Santosh Shukla wrote:
>> On 2/1/2023 3:58 AM, Sean Christopherson wrote:
>>> On Tue, Nov 29, 2022, Maxim Levitsky wrote:
>>>> @@ -5191,9 +5191,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
>>>>
>>>> vcpu->arch.nmi_injected = events->nmi.injected;
>>>> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
>>>> - vcpu->arch.nmi_pending = events->nmi.pending;
>>>> + atomic_add(events->nmi.pending, &vcpu->arch.nmi_queued);
>>>> +
>>>> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>>>>
>>>> + process_nmi(vcpu);
>>>
>>> Argh, having two process_nmi() calls is ugly (not blaming your code, it's KVM's
>>> ABI that's ugly). E.g. if we collapse this down, it becomes:
>>>
>>> process_nmi(vcpu);
>>>
>>> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>>> <blah blah blah>
>>> }
>>> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>>>
>>> process_nmi(vcpu);
>>>
>>> And the second mess is that V_NMI needs to be cleared.
>>>
>>
>> Can you please elaborate on "V_NMI cleared" scenario? Are you mentioning
>> about V_NMI_MASK or svm->nmi_masked?
>
> V_NMI_MASK. KVM needs to purge any pending virtual NMIs when userspace sets
> vCPU event state and KVM_VCPUEVENT_VALID_NMI_PENDING is set.
>

As per the APM: V_NMI_MASK is managed by the processor
"
V_NMI_MASK: Indicates whether virtual NMIs are masked. The processor will set V_NMI_MASK
once it takes the virtual NMI. V_NMI_MASK is cleared when the guest successfully completes an
IRET instruction or #VMEXIT occurs while delivering the virtual NMI
"

In my initial implementation I had changed V_NMI_MASK for the SMM scenario [1],
This is also not required as HW will save the V_NMI/V_NMI_MASK on
SMM entry and restore them on RSM.

That said the svm_{get,set}_nmi_mask will look something like:

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index a9e9bfbffd72..08911a33cf1e 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3635,13 +3635,21 @@ static int svm_nmi_allowed(struct kvm_vcpu *vcpu, bool for_injection)

static bool svm_get_nmi_mask(struct kvm_vcpu *vcpu)
{
- return to_svm(vcpu)->nmi_masked;
+ struct vcpu_svm *svm = to_svm(vcpu);
+
+ if (is_vnmi_enabled(svm))
+ return svm->vmcb->control.int_ctl & V_NMI_MASK;
+ else
+ return svm->nmi_masked;
}

static void svm_set_nmi_mask(struct kvm_vcpu *vcpu, bool masked)
{
struct vcpu_svm *svm = to_svm(vcpu);

+ if (is_vnmi_enabled(svm))
+ return;
+
if (masked) {
svm->nmi_masked = true;
svm_set_iret_intercept(svm);

is there any inputs on above approach?

[1] https://lore.kernel.org/all/20220810061226.1286-4-santosh.shukla@xxxxxxx/

>>> The first process_nmi() effectively exists to (a) purge nmi_queued and (b) keep
>>> nmi_pending if KVM_VCPUEVENT_VALID_NMI_PENDING is not set. I think we can just
>>> replace that with an set of nmi_queued, i.e.
>>>
>>> if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
>>> vcpu->arch-nmi_pending = 0;
>>> atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
>>> process_nmi();
>>>
>> You mean replace above process_nmi() with kvm_make_request(KVM_REQ_NMI, vcpu), right?
>> I'll try with above proposal.
>
> Yep, if that works. Actually, that might be a requirement. There's a
>
> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>
> lurking below this. Invoking process_nmi() before NMI blocking is updated could
> result in KVM incorrectly dropping/keeping NMIs. I don't think it would be a
> problem in practice since KVM save only one NMI, but userspace could stuff NMIs.
>
> Huh. The the existing code is buggy. events->nmi.pending is a u8, and
> arch.nmi_pending is an unsigned int. KVM doesn't cap the incoming value, so
> userspace could set up to 255 pending NMIs. The extra weird part is that the extra
> NMIs will get dropped the next time KVM stumbles through process_nmi().
>
> Amusingly, KVM only saves one pending NMI, i.e. in a true migration scenario KVM
> may drop an NMI.
>
> events->nmi.pending = vcpu->arch.nmi_pending != 0;
>
> The really amusing part is that that code was added by 7460fb4a3400 ("KVM: Fix
> simultaneous NMIs"). The only thing I can figure is that KVM_GET_VCPU_EVENTS was
> somewhat blindly updated without much thought about what should actually happen.
>
> So, can you slide the below in early in the series? Then in this series, convert
> to the above suggested flow (zero nmi_pending, stuff nmi_queued) in another patch?
>
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
> Date: Wed, 8 Feb 2023 07:44:16 -0800
> Subject: [PATCH] KVM: x86: Save/restore all NMIs when multiple NMIs are
> pending
>
> Save all pending NMIs in KVM_GET_VCPU_EVENTS, and queue KVM_REQ_NMI if one
> or more NMIs are pending after KVM_SET_VCPU_EVENTS in order to re-evaluate
> pending NMIs with respect to NMI blocking.
>
> KVM allows multiple NMIs to be pending in order to faithfully emulate bare
> metal handling of simultaneous NMIs (on bare metal, truly simultaneous
> NMIs are impossible, i.e. one will always arrive first and be consumed).
> Support for simultaneous NMIs botched the save/restore though. KVM only
> saves one pending NMI, but allows userspace to restore 255 pending NMIs
> as kvm_vcpu_events.nmi.pending is a u8, and KVM's internal state is stored
> in an unsigned int.
>
> 7460fb4a3400 ("KVM: Fix simultaneous NMIs")
> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> ---
> arch/x86/kvm/x86.c | 7 +++++--
> 1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 508074e47bc0..e9339acbf82a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5115,7 +5115,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct kvm_vcpu *vcpu,
> events->interrupt.shadow = static_call(kvm_x86_get_interrupt_shadow)(vcpu);
>
> events->nmi.injected = vcpu->arch.nmi_injected;
> - events->nmi.pending = vcpu->arch.nmi_pending != 0;
> + events->nmi.pending = vcpu->arch.nmi_pending;
> events->nmi.masked = static_call(kvm_x86_get_nmi_mask)(vcpu);
>
> /* events->sipi_vector is never valid when reporting to user space */
> @@ -5202,8 +5202,11 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,
> events->interrupt.shadow);
>
> vcpu->arch.nmi_injected = events->nmi.injected;
> - if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING)
> + if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
> vcpu->arch.nmi_pending = events->nmi.pending;
> + if (vcpu->arch.nmi_pending)
> + kvm_make_request(KVM_REQ_NMI, vcpu);
> + }
> static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);
>
> if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR &&
>

Ok.

On top of the above, I am including your suggested change as below...

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e0855599df65..437a6cea3bc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5201,9 +5201,9 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct kvm_vcpu *vcpu,

vcpu->arch.nmi_injected = events->nmi.injected;
if (events->flags & KVM_VCPUEVENT_VALID_NMI_PENDING) {
- vcpu->arch.nmi_pending = events->nmi.pending;
- if (vcpu->arch.nmi_pending)
- kvm_make_request(KVM_REQ_NMI, vcpu);
+ vcpu->arch.nmi_pending = 0;
+ atomic_set(&vcpu->arch.nmi_queued, events->nmi.pending);
+ kvm_make_request(KVM_REQ_NMI, vcpu);
}
static_call(kvm_x86_set_nmi_mask)(vcpu, events->nmi.masked);

does that make sense?

> base-commit: 6c77ae716d546d71b21f0c9ee7d405314a3f3f9e