Re: [PATCH RFC v1 00/10] KVM: nVMX: Use vmcs_config for setting up nested VMX MSRs

From: Sean Christopherson
Date: Thu Jun 23 2022 - 20:31:26 EST


On Wed, Jun 22, 2022, Vitaly Kuznetsov wrote:
> vmcs_config is a sanitized version of host VMX MSRs where some controls are
> filtered out (e.g. when Enlightened VMCS is enabled, some know bugs are
> discovered, some inconsistencies in controls are detected,...) but
> nested_vmx_setup_ctls_msrs() uses raw host MSRs instead. This may end up
> in exposing undesired controls to L1. Switch to using vmcs_config instead.
>
> RFC part: vmcs_config's sanitization now is a mix of "what can't be enabled"
> and "what KVM doesn't want" and we need to separate these as for nested VMX
> MSRs only the first category makes sense. This gives vmcs_config a slightly
> different meaning "controls which can be (theoretically) used". An alternative
> approach would be to store sanitized host MSRs values separately, sanitize
> them and and use in nested_vmx_setup_ctls_msrs() but currently I don't see
> any benefits. Comments welcome!

I like the idea overall, even though it's a decent amount of churn. It seems
easier to maintain than separate paths for nested. The alternative would be to
add common helpers to adjust the baseline configurations, but I don't see any
way to programmatically make that approach more robust.

An idea to further harden things. Or: an excuse to extend macro shenanigans :-)

If we throw all of the "opt" and "min" lists into macros, e.g. KVM_REQUIRED_*
and KVM_OPTIONAL_*, and then use those to define a KVM_KNOWN_* field, we can
prevent using the mutators to set/clear unknown bits at runtime via BUILD_BUG_ON().
The core builders, e.g. vmx_exec_control(), can still set unknown bits, i.e. set
bits that aren't enumerated to L1, but that's easier to audit and this would catch
regressions for the issues fixed in patches.

It'll required making add_atomic_switch_msr_special() __always_inline (or just
open code it), but that's not a big deal.

E.g. if we have

#define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>
#define KVM_OPTIONAL_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>

Then the builders for the controls shadows can do:

diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 286c88e285ea..5eb75822a09e 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -468,6 +468,8 @@ static inline u8 vmx_get_rvi(void)
}

#define BUILD_CONTROLS_SHADOW(lname, uname, bits) \
+#define KVM_KNOWN_ ## uname \
+ (KVM_REQUIRED_ ## uname | KVM_OPTIONAL_ ## uname) \
static inline void lname##_controls_set(struct vcpu_vmx *vmx, u##bits val) \
{ \
if (vmx->loaded_vmcs->controls_shadow.lname != val) { \
@@ -485,10 +487,12 @@ static inline u##bits lname##_controls_get(struct vcpu_vmx *vmx) \
} \
static inline void lname##_controls_setbit(struct vcpu_vmx *vmx, u##bits val) \
{ \
+ BUILD_BUG_ON(!(val & KVM_KNOWN_ ## uname)); \
lname##_controls_set(vmx, lname##_controls_get(vmx) | val); \
} \
static inline void lname##_controls_clearbit(struct vcpu_vmx *vmx, u##bits val) \
{ \
+ BUILD_BUG_ON(!(val & KVM_KNOWN_ ## uname)); \
lname##_controls_set(vmx, lname##_controls_get(vmx) & ~val); \
}
BUILD_CONTROLS_SHADOW(vm_entry, VM_ENTRY_CONTROLS, 32)



Handling the controls that are restricted to CONFIG_X86_64=y will be midly annoying,
but adding a base set isn't too bad, e.g.

#define __KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL <blah blah blah>
#ifdef CONFIG_X86_64
#define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL (__KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL | \
CPU_BASED_CR8_LOAD_EXITING | \
CPU_BASED_CR8_STORE_EXITING)
#else
#define KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL __KVM_REQUIRED_CPU_BASED_VM_EXEC_CONTROL
#endif