Re: [PATCH v2 5/7] KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02()

From: Maxim Levitsky
Date: Mon May 24 2021 - 08:34:44 EST


On Mon, 2021-05-17 at 15:50 +0200, Vitaly Kuznetsov wrote:
> When nested state migration happens during L1's execution, it
> is incorrect to modify eVMCS as it is L1 who 'owns' it at the moment.
> At lease genuine Hyper-v seems to not be very happy when 'clean fields'
> data changes underneath it.
>
> 'Clean fields' data is used in KVM twice: by copy_enlightened_to_vmcs12()
> and prepare_vmcs02_rare() so we can reset it from prepare_vmcs02() instead.
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> ---
> arch/x86/kvm/vmx/nested.c | 14 ++++++++------
> 1 file changed, 8 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> index eb2d25a93356..3bfbf991bf45 100644
> --- a/arch/x86/kvm/vmx/nested.c
> +++ b/arch/x86/kvm/vmx/nested.c
> @@ -2081,14 +2081,10 @@ void nested_sync_vmcs12_to_shadow(struct kvm_vcpu *vcpu)
> {
> struct vcpu_vmx *vmx = to_vmx(vcpu);
>
> - if (vmx->nested.hv_evmcs) {
> + if (vmx->nested.hv_evmcs)
> copy_vmcs12_to_enlightened(vmx);
> - /* All fields are clean */
> - vmx->nested.hv_evmcs->hv_clean_fields |=
> - HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
> - } else {
> + else
> copy_vmcs12_to_shadow(vmx);
> - }
>
> vmx->nested.need_vmcs12_to_shadow_sync = false;
> }
> @@ -2629,6 +2625,12 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
>
> kvm_rsp_write(vcpu, vmcs12->guest_rsp);
> kvm_rip_write(vcpu, vmcs12->guest_rip);
> +
> + /* Mark all fields as clean so L1 hypervisor can set what's dirty */
> + if (hv_evmcs)
> + vmx->nested.hv_evmcs->hv_clean_fields |=
> + HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
> +
> return 0;
> }
>

Hi!

If we avoid calling copy_enlightened_to_vmcs12 from
vmx_get_nested_state, then we don't need this patch, right?

In addition to that I think that we need to research on why
do we need to touch these clean bits, as from the spec, and
assuming that the clean bits should behave similar to how AMD
does it, clean bits should only be set by the L1 and never touched by
us.

We currently set clean bits in two places:

1. nested_vmx_handle_enlightened_vmptrld with vmlaunch, where it seems
like it is a workaround for a case (as we discussed on IRC) where
L1 keeps more than one active evmcs on a same vcpu, and 'vmresume's
them. Since we don't support this and have to do full context switch
when we switch a vmcs, we reset the clean bits so that evmcs is loaded
fully.
Also we reset the clean bits when a evmcs is 'vmlaunched' which
is also something we need to check if needed, and if needed
we probably should document that this is because of a bug in Hyper-V,
as it really should initialize these bits in this case.

I think that we should just ignore the clean bits in those cases
instead of resetting them in the evmcs.


2. In nested_sync_vmcs12_to_shadow which in practise is done only
on nested vmexits, when we updated the vmcs12 and need to update evmcs.
In this case you told me that Hyper-V has a bug that it expects
the clean bits to be cleaned by us and doesn't clean it on its own.
This makes sense although it is not documented in the Hyper-V spec,
and I would appreciate if we were to document this explicitly in the code.


Best regards,
Maxim Levitsky
>