[PATCH v2] KVM: nVMX: Don't expose eVMCS unsupported fields to L1
From: Anirudh Rayabharam
Date: Tue Jun 28 2022 - 06:33:41 EST
When running cloud-hypervisor tests, VM entry into an L2 guest on KVM on
Hyper-V fails with this splat (stripped for brevity):
[ 1481.600386] WARNING: CPU: 4 PID: 7641 at arch/x86/kvm/vmx/nested.c:4563 nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
[ 1481.600427] CPU: 4 PID: 7641 Comm: vcpu2 Not tainted 5.15.0-1008-azure #9-Ubuntu
[ 1481.600429] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 07/22/2021
[ 1481.600430] RIP: 0010:nested_vmx_vmexit+0x70d/0x790 [kvm_intel]
[ 1481.600447] Call Trace:
[ 1481.600449] <TASK>
[ 1481.600451] nested_vmx_reflect_vmexit+0x10b/0x440 [kvm_intel]
[ 1481.600457] __vmx_handle_exit+0xef/0x670 [kvm_intel]
[ 1481.600467] vmx_handle_exit+0x12/0x50 [kvm_intel]
[ 1481.600472] vcpu_enter_guest+0x83a/0xfd0 [kvm]
[ 1481.600524] vcpu_run+0x5e/0x240 [kvm]
[ 1481.600560] kvm_arch_vcpu_ioctl_run+0xd7/0x550 [kvm]
[ 1481.600597] kvm_vcpu_ioctl+0x29a/0x6d0 [kvm]
[ 1481.600634] __x64_sys_ioctl+0x91/0xc0
[ 1481.600637] do_syscall_64+0x5c/0xc0
[ 1481.600667] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 1481.600670] RIP: 0033:0x7f688becdaff
[ 1481.600686] </TASK>
TSC multiplier field is currently not supported in EVMCS in KVM. It was
previously not supported from Hyper-V but has been added since. Because
it is not supported in KVM the use "TSC scaling control" is filtered out
of vmcs_config by evmcs_sanitize_exec_ctrls().
However, in nested_vmx_setup_ctls_msrs(), TSC scaling is exposed to L1.
eVMCS unsupported fields are not sanitized. When L1 tries to launch an L2
guest, vmcs12 has TSC scaling enabled. This propagates to vmcs02. But KVM
doesn't set the TSC multiplier value because kvm_has_tsc_control is false.
Due to this VM entry for L2 guest fails. (VM entry fails if
"use TSC scaling" is 1 but TSC multiplier is 0.)
To fix, in nested_vmx_setup_ctls_msrs(), sanitize the values read from MSRs
by filtering out fields that are not supported by eVMCS.
This is a stable-friendly intermediate fix. A more comprehensive fix is
in progress [1] but is probably too complicated to safely apply to
stable.
[1]: https://lore.kernel.org/kvm/20220627160440.31857-1-vkuznets@xxxxxxxxxx/
Fixes: d041b5ea93352 ("KVM: nVMX: Enable nested TSC scaling")
Signed-off-by: Anirudh Rayabharam <anrayabh@xxxxxxxxxxxxxxxxxxx>
---
Changes since v1:
- Sanitize all eVMCS unsupported fields instead of just TSC scaling.
v1: https://lore.kernel.org/lkml/20220613161611.3567556-1-anrayabh@xxxxxxxxxxxxxxxxxxx/
---
arch/x86/kvm/vmx/nested.c | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index f5cb18e00e78..f88d748c7cc6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6564,6 +6564,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->pinbased_ctls_high);
msrs->pinbased_ctls_low |=
PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->pinbased_ctls_high &= ~EVMCS1_UNSUPPORTED_PINCTRL;
+#endif
msrs->pinbased_ctls_high &=
PIN_BASED_EXT_INTR_MASK |
PIN_BASED_NMI_EXITING |
@@ -6580,6 +6584,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->exit_ctls_low =
VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->exit_ctls_high &= ~EVMCS1_UNSUPPORTED_VMEXIT_CTRL;
+#endif
msrs->exit_ctls_high &=
#ifdef CONFIG_X86_64
VM_EXIT_HOST_ADDR_SPACE_SIZE |
@@ -6600,6 +6608,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->entry_ctls_high);
msrs->entry_ctls_low =
VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->entry_ctls_high &= ~EVMCS1_UNSUPPORTED_VMENTRY_CTRL;
+#endif
msrs->entry_ctls_high &=
#ifdef CONFIG_X86_64
VM_ENTRY_IA32E_MODE |
@@ -6657,6 +6669,10 @@ void nested_vmx_setup_ctls_msrs(struct nested_vmx_msrs *msrs, u32 ept_caps)
msrs->secondary_ctls_high);
msrs->secondary_ctls_low = 0;
+#if IS_ENABLED(CONFIG_HYPERV)
+ if (static_branch_unlikely(&enable_evmcs))
+ msrs->secondary_ctls_high &= ~EVMCS1_UNSUPPORTED_2NDEXEC;
+#endif
msrs->secondary_ctls_high &=
SECONDARY_EXEC_DESC |
SECONDARY_EXEC_ENABLE_RDTSCP |
--
2.34.1