Re: [PATCH v5 2/2] kvm: nVMX: Introduce KVM_CAP_NESTED_STATE

From: Radim KrÄmÃÅ
Date: Wed Jul 18 2018 - 13:55:53 EST


2018-07-10 11:27+0200, KarimAllah Ahmed:
> From: Jim Mattson <jmattson@xxxxxxxxxx>
>
> For nested virtualization L0 KVM is managing a bit of state for L2 guests,
> this state can not be captured through the currently available IOCTLs. In
> fact the state captured through all of these IOCTLs is usually a mix of L1
> and L2 state. It is also dependent on whether the L2 guest was running at
> the moment when the process was interrupted to save its state.
>
> With this capability, there are two new vcpu ioctls: KVM_GET_NESTED_STATE
> and KVM_SET_NESTED_STATE. These can be used for saving and restoring a VM
> that is in VMX operation.
>
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: kvm@xxxxxxxxxxxxxxx
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>
> [karahmed@ - rename structs and functions and make them ready for AMD and
> address previous comments.
> - handle nested.smm state.
> - rebase & a bit of refactoring.
> - Merge 7/8 and 8/8 into one patch. ]
> Signed-off-by: KarimAllah Ahmed <karahmed@xxxxxxxxx>
> ---
> v4 -> v5:
> - Drop the update to KVM_REQUEST_ARCH_BASE in favor of a patch to switch to
> u64 instead.
> - Fix commit message.
> - Handle nested.smm state as well.
> - rebase
>
> v3 -> v4:
> - Rename function to have _nested
>
> v2 -> v3:
> - Remove the forced VMExit from L2 after reading the kvm_state. The actual
> problem is solved.
> - Rebase again!
> - Set nested_run_pending during restore (not sure if it makes sense yet or
> not).
> - Reduce KVM_REQUEST_ARCH_BASE to 7 instead of 8 (the other alternative is
> to switch everything to u64)
>
> v1 -> v2:
> - Rename structs and functions and make them ready for AMD and address
> previous comments.
> - Rebase & a bit of refactoring.
> - Merge 7/8 and 8/8 into one patch.
> - Force a VMExit from L2 after reading the kvm_state to avoid mixed state
> between L1 and L2 on resurrecting the instance.
> ---
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> @@ -12976,6 +12977,197 @@ static int enable_smi_window(struct kvm_vcpu *vcpu)
> +static int set_vmcs_cache(struct kvm_vcpu *vcpu,
> + struct kvm_nested_state __user *user_kvm_nested_state,
> + struct kvm_nested_state *kvm_state)
> +
> +{
> [...]
> +
> + if (kvm_state->flags & KVM_STATE_NESTED_RUN_PENDING)
> + vmx->nested.nested_run_pending = 1;
> +
> + if (check_vmentry_prereqs(vcpu, vmcs12) ||
> + check_vmentry_postreqs(vcpu, vmcs12, &exit_qual))
> + return -EINVAL;
> +
> + ret = enter_vmx_non_root_mode(vcpu);
> + if (ret)
> + return ret;
> +
> + /*
> + * The MMU is not initialized to point at the right entities yet and
> + * "get pages" would need to read data from the guest (i.e. we will
> + * need to perform gpa to hpa translation). So, This request will
> + * result in a call to nested_get_vmcs12_pages before the next
> + * VM-entry.
> + */
> + kvm_make_request(KVM_REQ_GET_VMCS12_PAGES, vcpu);
> +
> + vmx->nested.nested_run_pending = 1;

This is not necessary. We're only copying state and do not add anything
that would be lost on a nested VM exit without prior VM entry.

> +

Halting the VCPU should probably be done here, just like at the end of
nested_vmx_run().

> + return 0;
> +}
> +
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> @@ -963,6 +963,7 @@ struct kvm_ppc_resize_hpt {
> #define KVM_CAP_GET_MSR_FEATURES 153
> #define KVM_CAP_HYPERV_EVENTFD 154
> #define KVM_CAP_HYPERV_TLBFLUSH 155
> +#define KVM_CAP_STATE 156

KVM_CAP_NESTED_STATE

(good documentation makes code worse. :])