Re: [PATCH v6 12/20] kvm/vmx: Emulate MSR TEST_CTL

From: Xiaoyao Li
Date: Mon Apr 08 2019 - 05:57:52 EST


Hi, Thomas,

On Fri, 2019-04-05 at 14:30 +0200, Thomas Gleixner wrote:
> On Wed, 3 Apr 2019, Fenghua Yu wrote:
> > @@ -1663,6 +1663,12 @@ static int vmx_get_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > u32 index;
> >
> > switch (msr_info->index) {
> > + case MSR_TEST_CTL:
> > + if (!msr_info->host_initiated &&
> > + !(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
>
> Errm? If the MSR_TEST_CTL is exposed to the guest via CPUID then the
> rdmsr() in the guest is not supposed to fail just because
> CORE_CAP_SPLIT_LOCK_DETECT is not set.

you're right.

> vmx->msr_test_ctl holds the proper value, which is either 0 or
> CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported.
>
> So the chek needs to be guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY).

I don't think so. There is no dependecy between
guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) and MSR_TEST_CTL.
guest_cpuid_has(X86_FEATURE_CORE_CAPABILITY) only indicates that guest has MSR
CORE_CAPABILITY.

Due to the fact that MSR_TEST_CTL is emulated with vmx->msr_test_ctl. I think it
's ok to always let userspace or guest to read MSR_TEST_CTL, it just returns the
emulated value. Like you said, " vmx->msr_test_ctl holds the proper value, which
is either 0 or CORE_CAP_SPLIT_LOCK_DETECT until more bits are supported."

> > + return 1;
> > + msr_info->data = vmx->msr_test_ctl;
> > + break;
> > #ifdef CONFIG_X86_64
> > case MSR_FS_BASE:
> > msr_info->data = vmcs_readl(GUEST_FS_BASE);
> > @@ -1797,6 +1803,14 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct
> > msr_data *msr_info)
> > u32 index;
> >
> > switch (msr_index) {
> > + case MSR_TEST_CTL:
> > + if (!(vcpu->arch.core_capability & CORE_CAP_SPLIT_LOCK_DETECT))
> > + return 1;
>
> Again, this wants to be a check for the availability of the MSR in the
> guest cpuid and not to the CORE_CAP_SPLIT_LOCK_DETECT magic bit.
>
> > +
> > + if (data & ~TEST_CTL_ENABLE_SPLIT_LOCK_DETECT)
> > + return 1;
>
> and this one wants to be:
>
> if (data & vmx->msr_test_ctl_mask)
>
> so additional bits can be enabled later on at exactly one point in the
> code, i.e. during vm init.

Again, in wrmsr handler, since MSR_TEST_CTL is emulated, it can be considered
that guest always has this MSR. It just needs to return 1 when reserved bits are
set.

Using vmx->msr_test_ctl_mask is a good idea. I will use it in next version.

> > + vmx->msr_test_ctl = data;
> > + break;
> >
> > +static void atomic_switch_msr_test_ctl(struct vcpu_vmx *vmx)
> > +{
> > + u64 host_msr_test_ctl;
> > +
> > + /* if TEST_CTL MSR doesn't exist on the hardware, do nothing */
> > + if (rdmsrl_safe(MSR_TEST_CTL, &host_msr_test_ctl))
> > + return;
>
> Yikes. So on hosts which do not have MSR_TEST_CTL this takes a fault on
> every VM enter. The host knows whether it has this MSR or not.
>
> Even if it exists there is no point to do the rdmsr on every vmenter. The
> value should be cached per cpu. It only changes when:
>
> 1) #AC triggers in kernel space
>
> 2) The sysfs knob is flipped
>
> #1 It happens either _BEFORE_ or _AFTER_ atomic_switch_msr_test_ctl(). In
> both cases the cached value is correct in atomic_switch_msr_test_ctl().
>
> If it happens _AFTER_ atomic_switch_msr_test_ctl() then the VMEXIT will
> restore the enabled bit, but there is nothing you can do about that.
>
> #2 CANNOT happen in the context of vcpu_run().
>
> So this wants to be:
>
> host_msr_test_ctl = this_cpu_read(msr_test_ctl_cache);

You're right and thanks for your advice.
I will take it in next version.

> > + if (host_msr_test_ctl == vmx->msr_test_ctl)
> > + clear_atomic_switch_msr(vmx, MSR_TEST_CTL);
> > + else
> > + add_atomic_switch_msr(vmx, MSR_TEST_CTL, vmx->msr_test_ctl,
> > + host_msr_test_ctl, false);
> > +}
> > +
> > u64 kvm_get_core_capability(void)
> > {
> > - return 0;
> > + u64 data;
> > +
> > + rdmsrl_safe(MSR_IA32_CORE_CAPABILITY, &data);
>
> if (!boot_cpu_has(X86_FEATURE_CORE_CAPABILITY))
> return 0;
>
> > + /* mask non-virtualizable functions */
> > + data &= CORE_CAP_SPLIT_LOCK_DETECT;
>
> Why?
>
> > + /*
> > + * There will be a list of FMS values that have split lock detection
> > + * but lack the CORE CAPABILITY MSR. In this case, set
> > + * CORE_CAP_SPLIT_LOCK_DETECT since we emulate MSR CORE_CAPABILITY.
>
> That's completely wrong. X86_FEATURE_SPLIT_LOCK_DETECT is set when the
> capability is enumerated in CPUID or it's set via the FMS quirk.
>
> So:
> data = 0;
> > + if (boot_cpu_has(X86_FEATURE_SPLIT_LOCK_DETECT))
> > + data |= CORE_CAP_SPLIT_LOCK_DETECT;
> > +
> > + return data;
>
> is absolutely sufficient.

Right, X86_FEATURE_SPLIT_LOCK_DETECT is set in 2 situation:

#1 the capability is enumerated by CORE_CAP_SPLIT_LOCK_DETECT (bit 5 of
MSR_IA32_CORE_CAPABILITY)

#2 via FMS list, in which those processors have split lock detection
feature but don't have MSR_IA32_CORE_CAPABILITY.

There two pathes work well for host, accurately for real FMS.
However, when it comes to virtualization, we can emulate a different FMS of
guest from host.

Considering the following case:

The host cpu is ICELAKE_MOBILE, which doesn't have X86_FEATURE_CORE_CAPABILITY
but has X86_FEATURE_SPLIT_LOCK_DETECT.
If we run a guest with cpu model, Skylake. It will hidden the feature
X86_FEATURE_SPLIT_LOCK_DETECT from guest, since guest's MSR_IA32_CORE_CAPABILITY
reports zero, and FMS of guest doesn't match the list.

In this way, it failed to expose the X86_FEATURE_SPLIT_LOCK_DETECT to guest.

Since MSR_IA32_CORE_CAPBILITY is emulated in software for guest. We can enforce
using #1 to report X86_FEATURE_SPLIT_LOCK_DETECT of guest, thus guest can get
rid of the limitation of the FMS list.


> Thanks,
>
> tglx