RE: [PATCH] KVM: nVMX: Dynamically compute max VMCS index for vmcs12

From: Hu, Robert
Date: Wed Jul 21 2021 - 06:13:05 EST


> -----Original Message-----
> From: Sean Christopherson <seanjc@xxxxxxxxxx>
> Sent: Tuesday, June 22, 2021 01:08
> To: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>; Wanpeng Li
> <wanpengli@xxxxxxxxxxx>; Jim Mattson <jmattson@xxxxxxxxxx>; Joerg
> Roedel <joro@xxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH] KVM: nVMX: Dynamically compute max VMCS index for
> vmcs12
>
> On Mon, Jun 21, 2021, Paolo Bonzini wrote:
> > On 18/06/21 23:46, Sean Christopherson wrote:
> > > Calculate the max VMCS index for vmcs12 by walking the array to find
> > > the actual max index. Hardcoding the index is prone to bitrot, and
> > > the calculation is only done on KVM bringup (albeit on every CPU,
> > > but there aren't _that_ many null entries in the array).
> > >
> > > Fixes: 3c0f99366e34 ("KVM: nVMX: Add a TSC multiplier field in
> > > VMCS12")
> > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx>
> > > ---
> > >
> > > Note, the vmx test in kvm-unit-tests will still fail using stock
> > > QEMU, as QEMU also hardcodes and overwrites the MSR. The test
> > > passes if I hack KVM to ignore userspace (it was easier than rebuilding
> QEMU).
> >
> > Queued, thanks. Without having checked the kvm-unit-tests sources
> > very thoroughly, this might be a configuration issue in
> > kvm-unit-tests; in theory "-cpu host" (unlike "-cpu
> > host,migratable=no") should not enable TSC scaling.
>
> As noted in the code comments, KVM allows VMREAD/VMWRITE to all defined
> fields, whether or not the field should actually exist for the vCPU model doesn't
> enter into the equation. That's technically wrong as there are a number of
> fields that the SDM explicitly states exist iff a certain feature is supported.
[Hu, Robert]
It's right that a number of fields' existence depends on some certain feature.
Meanwhile, "IA32_VMX_VMCS_ENUM indicates to software the highest index
value used in the encoding of any field *supported* by the processor", rather than
*existed*.
So my understanding is no matter what VMCS exec control field's value is set, value
of IA32_VMX_VMCS_ENUM shall not be affected, as it reports the physical CPU's capability
rather than runtime VMCS configuration.
Back to nested case, L1's VMCS configuration lays the "physical" capability for L2, right?
nested_vmx_msrs or at least nested_vmx_msrs.vmcs_enum shall be put to vcpu
scope, rather than current kvm global.
Current nested_vmx_calc_vmcs_enum_msr() is invoked at early stage, before vcpu features
are settled. I think should be moved to later stage as well.

> To fix that we'd need to add a "feature flag" to vmcs_field_to_offset_table that is
> checked against the vCPU model, though updating the MSR would probably fall
> onto userspace's shoulders?
>
> And FWIW, this is the QEMU code:
>
> #define VMCS12_MAX_FIELD_INDEX (0x17)
>
> static void kvm_msr_entry_add_vmx(X86CPU *cpu, FeatureWordArray f)
> {
> ...
>
> /*
> * Just to be safe, write these with constant values. The CRn_FIXED1
> * MSRs are generated by KVM based on the vCPU's CPUID.
> */
> kvm_msr_entry_add(cpu, MSR_IA32_VMX_CR0_FIXED0,
> CR0_PE_MASK | CR0_PG_MASK | CR0_NE_MASK);
> kvm_msr_entry_add(cpu, MSR_IA32_VMX_CR4_FIXED0,
> CR4_VMXE_MASK);
> kvm_msr_entry_add(cpu, MSR_IA32_VMX_VMCS_ENUM,
> VMCS12_MAX_FIELD_INDEX << 1);
> }