[PATCH v5 06/15] KVM: nVMX: Keep KVM updates to BNDCFGS ctrl bits across MSR write

From: Sean Christopherson
Date: Tue Jun 07 2022 - 21:33:43 EST


From: Oliver Upton <oupton@xxxxxxxxxx>

Since commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX controls
when guest MPX disabled"), KVM has taken ownership of the "load
IA32_BNDCFGS" and "clear IA32_BNDCFGS" VMX entry/exit controls. The ABI
is that these bits must be set in the IA32_VMX_TRUE_{ENTRY,EXIT}_CTLS
MSRs if the guest's CPUID supports MPX, and clear otherwise.

However, commit aedbaf4f6afd ("KVM: x86: Extract
kvm_update_cpuid_runtime() from kvm_update_cpuid()") partially broke KVM
ownership of the aforementioned bits. Before, kvm_update_cpuid() was
exercised frequently when running a guest and constantly applied its own
changes to the BNDCFGS bits. Now, the BNDCFGS bits are only ever
updated after a KVM_SET_CPUID/KVM_SET_CPUID2 ioctl, meaning that a
subsequent MSR write from userspace will clobber these values.

Uphold the old ABI by reapplying KVM's tweaks to the BNDCFGS bits after
an MSR write from userspace.

Note, the old ABI that is being preserved is a KVM hack to workaround a
userspace bug; see commit 5f76f6f5ff96 ("KVM: nVMX: Do not expose MPX VMX
controls when guest MPX disabled"). VMCS controls are not tied to CPUID,
i.e. KVM should not be mucking with unrelated things. The argument that
it's KVM's responsibility to propagate CPUID state to VMX MSRs doesn't
hold water, as the MPX shenanigans are an exception, not the norm, e.g.
KVM doesn't perform the following adjustments (and this is but a subset
of all possible tweaks):

X86_FEATURE_LM => VM_EXIT_HOST_ADDR_SPACE_SIZE, VM_ENTRY_IA32E_MODE,
VMX_MISC_SAVE_EFER_LMA

X86_FEATURE_TSC => CPU_BASED_RDTSC_EXITING, CPU_BASED_USE_TSC_OFFSETTING,
SECONDARY_EXEC_TSC_SCALING

X86_FEATURE_INVPCID_SINGLE => SECONDARY_EXEC_ENABLE_INVPCID

X86_FEATURE_MWAIT => CPU_BASED_MONITOR_EXITING, CPU_BASED_MWAIT_EXITING

X86_FEATURE_INTEL_PT => SECONDARY_EXEC_PT_CONCEAL_VMX, SECONDARY_EXEC_PT_USE_GPA,
VM_EXIT_CLEAR_IA32_RTIT_CTL, VM_ENTRY_LOAD_IA32_RTIT_CTL

X86_FEATURE_XSAVES => SECONDARY_EXEC_XSAVES

Fixes: aedbaf4f6afd ("KVM: x86: Extract kvm_update_cpuid_runtime() from kvm_update_cpuid()")
Signed-off-by: Oliver Upton <oupton@xxxxxxxxxx>
[sean: explicitly document the original KVM hack]
Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
---
arch/x86/kvm/vmx/nested.c | 9 +++++++++
arch/x86/kvm/vmx/vmx.c | 2 +-
arch/x86/kvm/vmx/vmx.h | 2 ++
3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index fca30e79b3a0..d1c21d387716 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -1296,6 +1296,15 @@ vmx_restore_control_msr(struct vcpu_vmx *vmx, u32 msr_index, u64 data)
vmx_get_control_msr(&vmx->nested.msrs, msr_index, &lowp, &highp);
*lowp = data;
*highp = data >> 32;
+
+ /*
+ * To preserve an old, kludgy ABI, ensure KVM fiddling with the "true"
+ * entry/exit controls MSRs is preserved after userspace modifications.
+ */
+ if (msr_index == MSR_IA32_VMX_TRUE_ENTRY_CTLS ||
+ msr_index == MSR_IA32_VMX_TRUE_EXIT_CTLS)
+ nested_vmx_entry_exit_ctls_update(&vmx->vcpu);
+
return 0;
}

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 57df799ffa29..3f1671d7cbe4 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7398,7 +7398,7 @@ static void nested_vmx_cr_fixed1_bits_update(struct kvm_vcpu *vcpu)
#undef cr4_fixed1_update
}

-static void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 71bcb486e73f..576fed7e33de 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -425,6 +425,8 @@ static inline void vmx_set_intercept_for_msr(struct kvm_vcpu *vcpu, u32 msr,

void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu);

+void nested_vmx_entry_exit_ctls_update(struct kvm_vcpu *vcpu);
+
/*
* Note, early Intel manuals have the write-low and read-high bitmap offsets
* the wrong way round. The bitmaps control MSRs 0x00000000-0x00001fff and
--
2.36.1.255.ge46751e96f-goog