Re: [PATCH v2 0/7] KVM: nVMX: Fixes for nested state migration when eVMCS is in use
From: Vitaly Kuznetsov
Date: Mon May 24 2021 - 08:44:47 EST
Maxim Levitsky <mlevitsk@xxxxxxxxxx> writes:
> On Mon, 2021-05-17 at 15:50 +0200, Vitaly Kuznetsov wrote:
>> Changes since v1 (Sean):
>> - Drop now-unneeded curly braces in nested_sync_vmcs12_to_shadow().
>> - Pass 'evmcs->hv_clean_fields' instead of 'bool from_vmentry' to
>> copy_enlightened_to_vmcs12().
>>
>> Commit f5c7e8425f18 ("KVM: nVMX: Always make an attempt to map eVMCS after
>> migration") fixed the most obvious reason why Hyper-V on KVM (e.g. Win10
>> + WSL2) was crashing immediately after migration. It was also reported
>> that we have more issues to fix as, while the failure rate was lowered
>> signifincatly, it was still possible to observe crashes after several
>> dozens of migration. Turns out, the issue arises when we manage to issue
>> KVM_GET_NESTED_STATE right after L2->L2 VMEXIT but before L1 gets a chance
>> to run. This state is tracked with 'need_vmcs12_to_shadow_sync' flag but
>> the flag itself is not part of saved nested state. A few other less
>> significant issues are fixed along the way.
>>
>> While there's no proof this series fixes all eVMCS related problems,
>> Win10+WSL2 was able to survive 3333 (thanks, Max!) migrations without
>> crashing in testing.
>>
>> Patches are based on the current kvm/next tree.
>>
>> Vitaly Kuznetsov (7):
>> KVM: nVMX: Introduce nested_evmcs_is_used()
>> KVM: nVMX: Release enlightened VMCS on VMCLEAR
>> KVM: nVMX: Ignore 'hv_clean_fields' data when eVMCS data is copied in
>> vmx_get_nested_state()
>> KVM: nVMX: Force enlightened VMCS sync from nested_vmx_failValid()
>> KVM: nVMX: Reset eVMCS clean fields data from prepare_vmcs02()
>> KVM: nVMX: Request to sync eVMCS from VMCS12 after migration
>> KVM: selftests: evmcs_test: Test that KVM_STATE_NESTED_EVMCS is never
>> lost
>>
>> arch/x86/kvm/vmx/nested.c | 110 ++++++++++++------
>> .../testing/selftests/kvm/x86_64/evmcs_test.c | 64 +++++-----
>> 2 files changed, 115 insertions(+), 59 deletions(-)
>>
>
> Hi Vitaly!
>
> In addition to the review of this patch series,
Thanks by the way!
> I would like
> to share an idea on how to avoid the hack of mapping the evmcs
> in nested_vmx_vmexit, because I think I found a possible generic
> solution to this and similar issues:
>
> The solution is to always set nested_run_pending after
> nested migration (which means that we won't really
> need to migrate this flag anymore).
>
> I was thinking a lot about it and I think that there is no downside to this,
> other than sometimes a one extra vmexit after migration.
>
> Otherwise there is always a risk of the following scenario:
>
> 1. We migrate with nested_run_pending=0 (but don't restore all the state
> yet, like that HV_X64_MSR_VP_ASSIST_PAGE msr,
> or just the guest memory map is not up to date, guest is in smm or something
> like that)
>
> 2. Userspace calls some ioctl that causes a nested vmexit
>
> This can happen today if the userspace calls
> kvm_arch_vcpu_ioctl_get_mpstate -> kvm_apic_accept_events -> kvm_check_nested_events
>
> 3. Userspace finally sets correct guest's msrs, correct guest memory map and only
> then calls KVM_RUN
>
> This means that at (2) we can't map and write the evmcs/vmcs12/vmcb12 even
> if KVM_REQ_GET_NESTED_STATE_PAGES is pending,
> but we have to do so to complete the nested vmexit.
Why do we need to write to eVMCS to complete vmexit? AFAICT, there's
only one place which calls copy_vmcs12_to_enlightened():
nested_sync_vmcs12_to_shadow() which, in its turn, has only 1 caller:
vmx_prepare_switch_to_guest() so unless userspace decided to execute
not-fully-restored guest this should not happen. I'm probably missing
something in your scenario)
>
> To some extent, the entry to the nested mode after a migration is only complete
> when we process the KVM_REQ_GET_NESTED_STATE_PAGES, so we shoudn't interrupt it.
>
> This will allow us to avoid dealing with KVM_REQ_GET_NESTED_STATE_PAGES on
> nested vmexit path at all.
Remember, we have three possible states when nested state is
transferred:
1) L2 was running
2) L1 was running
3) We're in beetween L2 and L1 (need_vmcs12_to_shadow_sync = true).
Is 'nested_run_pending' suitable for all of them? Could you maybe draft
a patch so we can see how this works (in both 'normal' and 'evmcs'
cases)?
--
Vitaly