Re: [PATCH 38/61] KVM: x86: Introduce kvm_cpu_caps to replace runtime CPUID masking

From: Sean Christopherson
Date: Mon Feb 24 2020 - 17:57:47 EST


On Mon, Feb 24, 2020 at 05:32:54PM +0100, Vitaly Kuznetsov wrote:
> Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:
>
> > Calculate the CPUID masks for KVM_GET_SUPPORTED_CPUID at load time using
> > what is effectively a KVM-adjusted copy of boot_cpu_data, or more
> > precisely, the x86_capability array in boot_cpu_data.
> >
> > In terms of KVM support, the vast majority of CPUID feature bits are
> > constant, and *all* feature support is known at KVM load time. Rather
> > than apply boot_cpu_data, which is effectively read-only after init,
> > at runtime, copy it into a KVM-specific array and use *that* to mask
> > CPUID registers.
> >
> > In additional to consolidating the masking, kvm_cpu_caps can be adjusted
> > by SVM/VMX at load time and thus eliminate all feature bit manipulation
> > in ->set_supported_cpuid().
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > ---
> > arch/x86/kvm/cpuid.c | 229 +++++++++++++++++++++++--------------------
> > arch/x86/kvm/cpuid.h | 19 ++++
> > arch/x86/kvm/x86.c | 2 +
> > 3 files changed, 142 insertions(+), 108 deletions(-)
> >
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 20a7af320291..c2a4c9df49a9 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -24,6 +24,13 @@
> > #include "trace.h"
> > #include "pmu.h"
> >
> > +/*
> > + * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
> > + * aligned to sizeof(unsigned long) because it's not accessed via bitops.
> > + */
> > +u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
> > +EXPORT_SYMBOL_GPL(kvm_cpu_caps);
> > +
> > static u32 xstate_required_size(u64 xstate_bv, bool compacted)
> > {
> > int feature_bit = 0;
> > @@ -259,7 +266,119 @@ static __always_inline void cpuid_entry_mask(struct kvm_cpuid_entry2 *entry,
> > {
> > u32 *reg = cpuid_entry_get_reg(entry, leaf * 32);
> >
> > - *reg &= boot_cpu_data.x86_capability[leaf];
> > + BUILD_BUG_ON(leaf > ARRAY_SIZE(kvm_cpu_caps));
>
> Should this be '>=' ?

Yep, nice catch.

> > + *reg &= kvm_cpu_caps[leaf];
> > +}
> > +
> > +static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
> > +{
> > + reverse_cpuid_check(leaf);
> > + kvm_cpu_caps[leaf] &= mask;
> > +}
> > +
> > +void kvm_set_cpu_caps(void)
> > +{
> > + unsigned f_nx = is_efer_nx() ? F(NX) : 0;
> > +#ifdef CONFIG_X86_64
> > + unsigned f_gbpages = F(GBPAGES);
> > + unsigned f_lm = F(LM);
> > +#else
> > + unsigned f_gbpages = 0;
> > + unsigned f_lm = 0;
> > +#endif
>
> Three too many bare 'unsinged's :-)

Roger that, I'll fix this up.

> > +
> > + BUILD_BUG_ON(sizeof(kvm_cpu_caps) >
> > + sizeof(boot_cpu_data.x86_capability));
> > +
> > + memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
> > + sizeof(kvm_cpu_caps));
> > +
> > + kvm_cpu_cap_mask(CPUID_1_EDX,
> > + F(FPU) | F(VME) | F(DE) | F(PSE) |
> > + F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > + F(CX8) | F(APIC) | 0 /* Reserved */ | F(SEP) |
> > + F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > + F(PAT) | F(PSE36) | 0 /* PSN */ | F(CLFLUSH) |
> > + 0 /* Reserved, DS, ACPI */ | F(MMX) |
> > + F(FXSR) | F(XMM) | F(XMM2) | F(SELFSNOOP) |
> > + 0 /* HTT, TM, Reserved, PBE */
> > + );
> > +
> > + kvm_cpu_cap_mask(CPUID_8000_0001_EDX,
> > + F(FPU) | F(VME) | F(DE) | F(PSE) |
> > + F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> > + F(CX8) | F(APIC) | 0 /* Reserved */ | F(SYSCALL) |
> > + F(MTRR) | F(PGE) | F(MCA) | F(CMOV) |
> > + F(PAT) | F(PSE36) | 0 /* Reserved */ |
> > + f_nx | 0 /* Reserved */ | F(MMXEXT) | F(MMX) |
> > + F(FXSR) | F(FXSR_OPT) | f_gbpages | F(RDTSCP) |
> > + 0 /* Reserved */ | f_lm | F(3DNOWEXT) | F(3DNOW)
> > + );
> > +
> > + kvm_cpu_cap_mask(CPUID_1_ECX,
> > + /* NOTE: MONITOR (and MWAIT) are emulated as NOP,
> > + * but *not* advertised to guests via CPUID ! */
> > + F(XMM3) | F(PCLMULQDQ) | 0 /* DTES64, MONITOR */ |
> > + 0 /* DS-CPL, VMX, SMX, EST */ |
> > + 0 /* TM2 */ | F(SSSE3) | 0 /* CNXT-ID */ | 0 /* Reserved */ |
> > + F(FMA) | F(CX16) | 0 /* xTPR Update, PDCM */ |
> > + F(PCID) | 0 /* Reserved, DCA */ | F(XMM4_1) |
> > + F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
> > + 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
> > + F(F16C) | F(RDRAND)
> > + );
>
> I would suggest we order things by CPUID_NUM here, i.e.
>
> CPUID_1_ECX
> CPUID_1_EDX
> CPUID_7_1_EAX
> CPUID_7_0_EBX
> CPUID_7_ECX
> CPUID_7_EDX
> CPUID_D_1_EAX
> ...

Hmm, generally speaking I agree, but I didn't want to change the ordering
in this patch when moving the code. Throw a patch on top? Leave as is?
Something else?