Re: [PATCH] x86/kvm/nVMCS: fix VMCLEAR when Enlightened VMCS is in use
From: Vitaly Kuznetsov
Date: Mon Jun 24 2019 - 10:16:38 EST
Liran Alon <liran.alon@xxxxxxxxxx> writes:
>> On 24 Jun 2019, at 16:30, Vitaly Kuznetsov <vkuznets@xxxxxxxxxx> wrote:
>>
>> When Enlightened VMCS is in use, it is valid to do VMCLEAR and,
>> according to TLFS, this should "transition an enlightened VMCS from the
>> active to the non-active state". It is, however, wrong to assume that
>> it is only valid to do VMCLEAR for the eVMCS which is currently active
>> on the vCPU performing VMCLEAR.
>>
>> Currently, the logic in handle_vmclear() is broken: in case, there is no
>> active eVMCS on the vCPU doing VMCLEAR we treat the argument as a 'normal'
>> VMCS and kvm_vcpu_write_guest() to the 'launch_state' field irreversibly
>> corrupts the memory area.
>>
>> So, in case the VMCLEAR argument is not the current active eVMCS on the
>> vCPU, how can we know if the area it is pointing to is a normal or an
>> enlightened VMCS?
>> Thanks to the bug in Hyper-V (see commit 72aeb60c52bf7 ("KVM: nVMX: Verify
>> eVMCS revision id match supported eVMCS version on eVMCS VMPTRLD")) we can
>> not, the revision can't be used to distinguish between them. So let's
>> assume it is always enlightened in case enlightened vmentry is enabled in
>> the assist page. Also, check if vmx->nested.enlightened_vmcs_enabled to
>> minimize the impact for 'unenlightened' workloads.
>>
>> Fixes: b8bbab928fb1 ("KVM: nVMX: implement enlightened VMPTRLD and VMCLEAR")
>> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
>> ---
>> arch/x86/kvm/vmx/evmcs.c | 18 ++++++++++++++++++
>> arch/x86/kvm/vmx/evmcs.h | 1 +
>> arch/x86/kvm/vmx/nested.c | 19 ++++++++-----------
>> 3 files changed, 27 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
>> index 1a6b3e1581aa..eae636ec0cc8 100644
>> --- a/arch/x86/kvm/vmx/evmcs.c
>> +++ b/arch/x86/kvm/vmx/evmcs.c
>> @@ -3,6 +3,7 @@
>> #include <linux/errno.h>
>> #include <linux/smp.h>
>>
>> +#include "../hyperv.h"
>> #include "evmcs.h"
>> #include "vmcs.h"
>> #include "vmx.h"
>> @@ -309,6 +310,23 @@ void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf)
>> }
>> #endif
>>
>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr)
>
> I prefer to rename evmptr to evmcs_ptr. I think itâs more readable and sufficiently short.
> In addition, I think you should return either -1ull or assist_page.current_nested_vmcs.
> i.e. Donât return evmcs_ptr by pointer but instead as a return-value
> and get rid of the bool.
Sure, can do in v2.
>
>> +{
>> + struct hv_vp_assist_page assist_page;
>> +
>> + *evmptr = -1ull;
>> +
>> + if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
>> + return false;
>> +
>> + if (unlikely(!assist_page.enlighten_vmentry))
>> + return false;
>> +
>> + *evmptr = assist_page.current_nested_vmcs;
>> +
>> + return true;
>> +}
>> +
>> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
>> {
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> diff --git a/arch/x86/kvm/vmx/evmcs.h b/arch/x86/kvm/vmx/evmcs.h
>> index e0fcef85b332..c449e79a9c4a 100644
>> --- a/arch/x86/kvm/vmx/evmcs.h
>> +++ b/arch/x86/kvm/vmx/evmcs.h
>> @@ -195,6 +195,7 @@ static inline void evmcs_sanitize_exec_ctrls(struct vmcs_config *vmcs_conf) {}
>> static inline void evmcs_touch_msr_bitmap(void) {}
>> #endif /* IS_ENABLED(CONFIG_HYPERV) */
>>
>> +bool nested_enlightened_vmentry(struct kvm_vcpu *vcpu, u64 *evmptr);
>> uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu);
>> int nested_enable_evmcs(struct kvm_vcpu *vcpu,
>> uint16_t *vmcs_version);
>> diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
>> index 9214b3aea1f9..ee8dda7d8a03 100644
>> --- a/arch/x86/kvm/vmx/nested.c
>> +++ b/arch/x86/kvm/vmx/nested.c
>> @@ -1765,26 +1765,21 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
>> bool from_launch)
>> {
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> - struct hv_vp_assist_page assist_page;
>> + u64 evmptr;
>
> I prefer to rename evmptr to evmcs_ptr. I think itâs more readable and sufficiently short.
>
Sure.
>>
>> if (likely(!vmx->nested.enlightened_vmcs_enabled))
>> return 1;
>>
>> - if (unlikely(!kvm_hv_get_assist_page(vcpu, &assist_page)))
>> + if (!nested_enlightened_vmentry(vcpu, &evmptr))
>> return 1;
>>
>> - if (unlikely(!assist_page.enlighten_vmentry))
>> - return 1;
>> -
>> - if (unlikely(assist_page.current_nested_vmcs !=
>> - vmx->nested.hv_evmcs_vmptr)) {
>> -
>> + if (unlikely(evmptr != vmx->nested.hv_evmcs_vmptr)) {
>> if (!vmx->nested.hv_evmcs)
>> vmx->nested.current_vmptr = -1ull;
>>
>> nested_release_evmcs(vcpu);
>>
>> - if (kvm_vcpu_map(vcpu, gpa_to_gfn(assist_page.current_nested_vmcs),
>> + if (kvm_vcpu_map(vcpu, gpa_to_gfn(evmptr),
>> &vmx->nested.hv_evmcs_map))
>> return 0;
>>
>> @@ -1826,7 +1821,7 @@ static int nested_vmx_handle_enlightened_vmptrld(struct kvm_vcpu *vcpu,
>> */
>> vmx->nested.hv_evmcs->hv_clean_fields &=
>> ~HV_VMX_ENLIGHTENED_CLEAN_FIELD_ALL;
>> - vmx->nested.hv_evmcs_vmptr = assist_page.current_nested_vmcs;
>> + vmx->nested.hv_evmcs_vmptr = evmptr;
>>
>> /*
>> * Unlike normal vmcs12, enlightened vmcs12 is not fully
>> @@ -4331,6 +4326,7 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>> struct vcpu_vmx *vmx = to_vmx(vcpu);
>> u32 zero = 0;
>> gpa_t vmptr;
>> + u64 evmptr;
>
> I prefer to rename evmptr to evmcs_ptr. I think itâs more readable and sufficiently short.
>
Sure.
>>
>> if (!nested_vmx_check_permission(vcpu))
>> return 1;
>> @@ -4346,7 +4342,8 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
>> return nested_vmx_failValid(vcpu,
>> VMXERR_VMCLEAR_VMXON_POINTER);
>>
>> - if (vmx->nested.hv_evmcs_map.hva) {
>> + if (unlikely(vmx->nested.enlightened_vmcs_enabled) &&
>> + nested_enlightened_vmentry(vcpu, &evmptr)) {
>> if (vmptr == vmx->nested.hv_evmcs_vmptr)
>
> Shouldnât you also remove the (vmptr == vmx->nested.hv_evmcs_vmptr) condition?
> To my understanding, vmx->nested.hv_evmcs_vmptr represents the address of the loaded eVMCS on current vCPU.
> But according to commit message, it is valid for vCPU to perform
> VMCLEAR on eVMCS that differ from loaded eVMCS on vCPU.
> E.g. The current vCPU may even have vmx->nested.hv_evmcs_vmptr set to
> -1ull.
nested_release_evmcs() unmaps current eVMCS on the vCPU, we can't easily
unmap eVMCS on other vCPUs without somehow synchronizing with
them. Actually, if we remove nested_release_evmcs() from here nothing is
going to change: the fact that eVMCS is mapped doesn't hurt much. If the
next enlightened vmentry is going to happen with the same evmptr we'll
have to map it back, in case a different one will be used - we'll unmap
the old.
In KVM, there's nothing we *have* to do to transition an eVMCS from
active to non-activer state. We, for example, don't enforce the
requirement that it can only be active on one vCPU at a time. Enforcing
it is expensive (some synchronization is required) and if L1 hypervisor
is misbehaving than, well, things are not going to work anyway.
That said I'm ok with dropping nested_release_evmcs() for consistency
but we can't just drop 'if (vmptr == vmx->nested.hv_evmcs_vmptr)'.
Thanks for your review!
--
Vitaly