[RFC PATCH v3 03/10] KVM: nVMX: Enable SPEC_CTRL virtualizaton for vmcs02

From: Chao Gao
Date: Wed Apr 10 2024 - 10:36:28 EST


to prevent nested guests from changing the SPEC_CTRL bits that userspace
doesn't allow a guest to change.

Propagate tertiary vm-exec controls from vmcs01 to vmcs02 and program
the mask of SPEC_CTRL MSRs as the userspace VMM requested.

With SPEC_CTRL virtualization enabled, guest will read from the shadow
value in VMCS. To ensure consistent view across nested VMX transitions,
propagate the shadow value between vmcs01 and vmcs02.

Signed-off-by: Chao Gao <chao.gao@xxxxxxxxx>
---
arch/x86/kvm/vmx/nested.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d05ddf751491..174790b2ffbc 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2381,6 +2381,20 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct loaded_vmcs *vmcs0
secondary_exec_controls_set(vmx, exec_control);
}

+ /*
+ * TERTIARY EXEC CONTROLS
+ */
+ if (cpu_has_tertiary_exec_ctrls()) {
+ exec_control = __tertiary_exec_controls_get(vmcs01);
+
+ exec_control &= TERTIARY_EXEC_SPEC_CTRL_SHADOW;
+ if (exec_control & TERTIARY_EXEC_SPEC_CTRL_SHADOW)
+ vmcs_write64(IA32_SPEC_CTRL_MASK,
+ vmx->vcpu.kvm->arch.force_spec_ctrl_mask);
+
+ tertiary_exec_controls_set(vmx, exec_control);
+ }
+
/*
* ENTRY CONTROLS
*
@@ -2625,6 +2639,19 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12,
if (kvm_caps.has_tsc_control)
vmcs_write64(TSC_MULTIPLIER, vcpu->arch.tsc_scaling_ratio);

+ /*
+ * L2 after nested VM-entry should observe the same value of
+ * IA32_SPEC_CTRL MSR as L1 unless:
+ * a. L1 loads IA32_SPEC_CTRL via MSR-load area.
+ * b. L1 enables IA32_SPEC_CTRL virtualization. this cannot
+ * happen since KVM doesn't expose this feature to L1.
+ *
+ * Propagate spec_ctrl_shadow (the value guest will get via RDMSR)
+ * to vmcs02. Later nested_vmx_load_msr() will take care of case a.
+ */
+ if (vmx->nested.nested_run_pending && cpu_has_spec_ctrl_shadow())
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, vmx->spec_ctrl_shadow);
+
nested_vmx_transition_tlb_flush(vcpu, vmcs12, true);

if (nested_cpu_has_ept(vmcs12))
@@ -4883,6 +4910,9 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason,
vmx_update_cpu_dirty_logging(vcpu);
}

+ if (cpu_has_spec_ctrl_shadow())
+ vmcs_write64(IA32_SPEC_CTRL_SHADOW, vmx->spec_ctrl_shadow);
+
/* Unpin physical memory we referred to in vmcs02 */
kvm_vcpu_unmap(vcpu, &vmx->nested.apic_access_page_map, false);
kvm_vcpu_unmap(vcpu, &vmx->nested.virtual_apic_map, true);
--
2.39.3