Re: [PATCH v5 05/15] KVM: nVMX: Let userspace set nVMX MSR to any _host_ supported value
From: Sean Christopherson
Date: Mon Oct 31 2022 - 13:11:39 EST
On Tue, Nov 01, 2022, Yu Zhang wrote:
> Hi Sean & Paolo,
>
> On Tue, Jun 07, 2022 at 09:35:54PM +0000, Sean Christopherson wrote:
> > Restrict the nVMX MSRs based on KVM's config, not based on the guest's
> > current config. Using the guest's config to audit the new config
> > prevents userspace from restoring the original config (KVM's config) if
> > at any point in the past the guest's config was restricted in any way.
>
> May I ask for an example here, to explain why we use the KVM config
> here, instead of the guest's? I mean, the guest's config can be
> adjusted after cpuid updates by vmx_vcpu_after_set_cpuid(). Yet the
> msr settings in vmcs_config.nested might be outdated by then.
vmcs_config.nested never becomes out-of-date, it's read-only after __init (not
currently marked as such, that will be remedied soon).
The auditing performed by KVM is purely to guard against userspace enabling
features that KVM doesn't support. KVM is not responsible for ensuring that the
vCPU's CPUID model match the VMX MSR model.
An example would be if userspace loaded the VMX MSRs with a default model, and
then enabled features one-by-one. In practice this doesn't happen because it's
more performant to gather all features and do a single KVM_SET_MSRS, but it's a
legitimate approach that KVM should allow.
> Another question is about the setting of secondary_ctls_high in
> nested_vmx_setup_ctls_msrs(). I saw there's a comment saying:
> "Do not include those that depend on CPUID bits, they are
> added later by vmx_vcpu_after_set_cpuid.".
That's a stale comment, see the very next commit, 8805875aa473 ("Revert "KVM: nVMX:
Do not expose MPX VMX controls when guest MPX disabled""), as well as the slightly
later commit 9389d5774aca ("Revert "KVM: nVMX: Expose load IA32_PERF_GLOBAL_CTRL
VM-{Entry,Exit} control"").
> But since cpuid updates can adjust the vmx->nested.msrs.secondary_ctls_high,
> do we really need to clear those flags for secondary_ctls_high in this
> global config?
As above, the comment is stale, KVM should not manipulate the VMX MSRs in response
to guest CPUID changes. The one exception to this is reserved CR0/CR4 bits. We
discussed quirking that behavior, but ultimately decided not to because (a) no
userspace actually cares and and (b) KVM would effectively need to make up behavior
if userspace allowed the guest to load CR4 bits via VM-Enter or VM-Exit that are
disallowed by CPUID, e.g. L1 could end up running with a CR4 that is supposed to
be impossible according to CPUID.
> Could we just set
> msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl?
KVM already does that in upstream (with further sanitization). See commit
bcdf201f8a4d ("KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs").
> If yes, code(in nested_vmx_setup_ctls_msrs()) such as
> if (enable_ept) {
> /* nested EPT: emulate EPT also to L1 */
> msrs->secondary_ctls_high |=
> SECONDARY_EXEC_ENABLE_EPT;
This can't be completely removed, though unless I'm missing something, it can and
should be shifted to the sanitization code, e.g.
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 8f67a9c4a287..0c41d5808413 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -6800,6 +6800,7 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
msrs->secondary_ctls_high = vmcs_conf->cpu_based_2nd_exec_ctrl;
msrs->secondary_ctls_high &=
+ SECONDARY_EXEC_ENABLE_EPT |
SECONDARY_EXEC_DESC |
SECONDARY_EXEC_ENABLE_RDTSCP |
SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE |
@@ -6820,9 +6821,6 @@ void nested_vmx_setup_ctls_msrs(struct vmcs_config *vmcs_conf, u32 ept_caps)
SECONDARY_EXEC_SHADOW_VMCS;
if (enable_ept) {
- /* nested EPT: emulate EPT also to L1 */
- msrs->secondary_ctls_high |=
- SECONDARY_EXEC_ENABLE_EPT;
msrs->ept_caps =
VMX_EPT_PAGE_WALK_4_BIT |
VMX_EPT_PAGE_WALK_5_BIT |
> or
> if (cpu_has_vmx_vmfunc()) {
> msrs->secondary_ctls_high |=
> SECONDARY_EXEC_ENABLE_VMFUNC;
This one is still required. KVM never enables VMFUNC for itself, i.e. it won't
be set in KVM's VMCS configuration.
> and other similar ones may also be uncessary.