Re: [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS
From: Sean Christopherson
Date: Thu Aug 18 2022 - 13:18:47 EST
On Tue, Aug 02, 2022, Vitaly Kuznetsov wrote:
> Enlightened VMCS v1 got updated and now includes the required fields
> for TSC scaling and loading PERF_GLOBAL_CTRL upon VMENTER/VMEXIT
> features. For Hyper-V on KVM enablement, KVM can just observe VMX control
> MSRs and use the features (with or without eVMCS) when
> possible. Hyper-V on KVM case is trickier because of live migration:
> the new features require explicit enablement from VMM to not break
> it. Luckily, the updated eVMCS revision comes with a feature bit in
> CPUID.0x4000000A.EBX.
The refactor to implement the 2-d array should go in a prep patch. Unless you
(or anyone else) objects to the feedback below, I'll do the split myself, post v6,
and queue this up (without patch 1, and assuming there're no major issues in the
back half of the series).
> Signed-off-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> ---
> arch/x86/kvm/hyperv.c | 2 +-
> arch/x86/kvm/vmx/evmcs.c | 88 +++++++++++++++++++++++++++++++--------
> arch/x86/kvm/vmx/evmcs.h | 17 ++------
> arch/x86/kvm/vmx/nested.c | 2 +-
> arch/x86/kvm/vmx/vmx.c | 2 +-
> 5 files changed, 78 insertions(+), 33 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 1098915360ae..8a2b24f9bbf6 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -2552,7 +2552,7 @@ int kvm_get_hv_cpuid(struct kvm_vcpu *vcpu, struct kvm_cpuid2 *cpuid,
> case HYPERV_CPUID_NESTED_FEATURES:
> ent->eax = evmcs_ver;
> ent->eax |= HV_X64_NESTED_MSR_BITMAP;
> -
> + ent->ebx |= HV_X64_NESTED_EVMCS1_2022_UPDATE;
> break;
>
> case HYPERV_CPUID_SYNDBG_VENDOR_AND_MAX_FUNCTIONS:
> diff --git a/arch/x86/kvm/vmx/evmcs.c b/arch/x86/kvm/vmx/evmcs.c
> index 8bea5dea0341..e8497f9854a1 100644
> --- a/arch/x86/kvm/vmx/evmcs.c
> +++ b/arch/x86/kvm/vmx/evmcs.c
> @@ -368,7 +368,60 @@ uint16_t nested_get_evmcs_version(struct kvm_vcpu *vcpu)
> return 0;
> }
>
> -void nested_evmcs_filter_control_msr(u32 msr_index, u64 *pdata)
> +enum evmcs_revision {
> + EVMCSv1_2016,
> + EVMCSv1_2022,
> + EVMCS_REVISION_MAX,
"MAX" is technically wrong, the "max" entry is usually the last valid entry. This
should be NR_EVMCS_REVISIONS, and then NR_EVMCS_CTRLS below.
> +};
> +
> +enum evmcs_unsupported_ctrl_type {
I think drop the "unsupported" here, because the naming gets weird when trying to
use it for something other than indexing evmcs_unsupported_ctls (see bottom).
> + EVMCS_EXIT_CTLS,
> + EVMCS_ENTRY_CTLS,
s/CTLS/CTRLS for consistency.
> + EVMCS_2NDEXEC,
> + EVMCS_PINCTRL,
> + EVMCS_VMFUNC,
> + EVMCS_CTRL_MAX,
> +};
> +
> +static u32 evmcs_unsupported_ctls[EVMCS_CTRL_MAX][EVMCS_REVISION_MAX] = {
This can be "const". And s/ctls/ctrls for consistency.
> + [EVMCS_EXIT_CTLS] = {
> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL | VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL,
> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMEXIT_CTRL,
> + },
> + [EVMCS_ENTRY_CTLS] = {
> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL | VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL,
> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMENTRY_CTRL,
> + },
> + [EVMCS_2NDEXEC] = {
> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_2NDEXEC | SECONDARY_EXEC_TSC_SCALING,
> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_2NDEXEC,
> + },
> + [EVMCS_PINCTRL] = {
> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_PINCTRL,
> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_PINCTRL,
> + },
> + [EVMCS_VMFUNC] = {
> + [EVMCSv1_2016] = EVMCS1_UNSUPPORTED_VMFUNC,
> + [EVMCSv1_2022] = EVMCS1_UNSUPPORTED_VMFUNC,
> + },
> +};
> +
> +static u32 evmcs_get_unsupported_ctls(struct kvm_vcpu *vcpu,
> + enum evmcs_unsupported_ctrl_type ctrl_type)
> +{
> + struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu);
> + enum evmcs_revision evmcs_rev = EVMCSv1_2016;
> +
> + if (!hv_vcpu)
This is a functiontal change, and I don't think it's correct either. Previously,
KVM would apply the EVMCSv1_2016 filter irrespective of whether or not
vcpu->arch.hyperv is non-NULL. nested_enable_evmcs() doesn't require a Hyper-V
vCPU, and AFAICT nothing requires a Hyper-V vCPU to use eVMCS.
So I believe this should be:
if (hv_vcpu &&
hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
evmcs_rev = EVMCSv1_2022;
> + return 0;
> +
> + if (hv_vcpu->cpuid_cache.nested_ebx & HV_X64_NESTED_EVMCS1_2022_UPDATE)
> + evmcs_rev = EVMCSv1_2022;
> +
> + return evmcs_unsupported_ctls[ctrl_type][evmcs_rev];
> +}
> +
...
> -int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
> +int nested_evmcs_check_controls(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
> {
> int ret = 0;
> u32 unsupp_ctl;
>
> unsupp_ctl = vmcs12->pin_based_vm_exec_control &
> - EVMCS1_UNSUPPORTED_PINCTRL;
> + evmcs_get_unsupported_ctls(vcpu, EVMCS_PINCTRL);
> if (unsupp_ctl) {
> trace_kvm_nested_vmenter_failed(
> - "eVMCS: unsupported pin-based VM-execution controls",
> + "eVMCS: unsupported pin-based VM-execution controls: ",
> unsupp_ctl);
Egad! This is all broken, at least in theory. The second param to
trace_kvm_nested_vmenter_failed() is the error code, not the bad value. It mostly
works because __print_symbolic() falls back to printing the hex value on error,
but that relies on there being no collisions/matches between unsupp_ctl and the
set of VMX_VMENTER_INSTRUCTION_ERRORS. E.g. if there's a collision then the
tracepoint will pretty print a string and cause confusion.
And checking every control even after one fails seems unnecessary subtle. It
provides marginal benefit while increasing the probability that we screw up and
squash "ret" to zero. Yeah, it might save some onion peeling, but if that happens
during production and now during initial development of a feature, then someone
has much bigger problems to worry about.
Unless there are objections, I'll fold in a patch to yield:
#define CC KVM_NESTED_VMENTER_CONSISTENCY_CHECK
static bool nested_evmcs_is_valid_controls(enum evmcs_ctrl_type type, u32 val)
{
return val & evmcs_get_unsupported_ctls(type);
}
int nested_evmcs_check_controls(struct vmcs12 *vmcs12)
{
if (CC(!nested_evmcs_is_valid_controls(EVMCS_PINCTRL,
vmcs12->pin_based_vm_exec_control)))
return -EINVAL;
if (CC(!nested_evmcs_is_valid_controls(EVMCS_2NDEXEC,
vmcs12->secondary_vm_exec_control)))
return -EINVAL;
if (CC(!nested_evmcs_is_valid_controls(EVMCS_EXIT_CTRLS,
vmcs12->vm_exit_controls)))
return -EINVAL;
if (CC(!nested_evmcs_is_valid_controls(EVMCS_ENTRY_CTRLS,
vmcs12->vm_entry_controls)))
return -EINVAL;
if (CC(!nested_evmcs_is_valid_controls(EVMCS_VMFUNC,
vmcs12->vm_function_control)))
return -EINVAL;
return 0;
}