Re: [PATCH 2/9] KVM: x86: Replace guts of "goverened" features with comprehensive cpu_caps
From: Sean Christopherson
Date: Mon Nov 27 2023 - 20:24:23 EST
On Sun, Nov 19, 2023, Maxim Levitsky wrote:
> On Fri, 2023-11-10 at 15:55 -0800, Sean Christopherson wrote:
> > @@ -840,23 +856,15 @@ struct kvm_vcpu_arch {
> > struct kvm_hypervisor_cpuid kvm_cpuid;
> >
> > /*
> > - * FIXME: Drop this macro and use KVM_NR_GOVERNED_FEATURES directly
> > - * when "struct kvm_vcpu_arch" is no longer defined in an
> > - * arch/x86/include/asm header. The max is mostly arbitrary, i.e.
> > - * can be increased as necessary.
> > + * Track the effective guest capabilities, i.e. the features the vCPU
> > + * is allowed to use. Typically, but not always, features can be used
> > + * by the guest if and only if both KVM and userspace want to expose
> > + * the feature to the guest. A common exception is for virtualization
> > + * holes, i.e. when KVM can't prevent the guest from using a feature,
> > + * in which case the vCPU "has" the feature regardless of what KVM or
> > + * userspace desires.
> > */
> > -#define KVM_MAX_NR_GOVERNED_FEATURES BITS_PER_LONG
> > -
> > - /*
> > - * Track whether or not the guest is allowed to use features that are
> > - * governed by KVM, where "governed" means KVM needs to manage state
> > - * and/or explicitly enable the feature in hardware. Typically, but
> > - * not always, governed features can be used by the guest if and only
> > - * if both KVM and userspace want to expose the feature to the guest.
> > - */
> > - struct {
> > - DECLARE_BITMAP(enabled, KVM_MAX_NR_GOVERNED_FEATURES);
> > - } governed_features;
> > + u32 cpu_caps[NR_KVM_CPU_CAPS];
>
> Won't it be better to call this 'effective_cpu_caps' or something like that,
> to put emphasis on the fact that these are not exactly the cpu caps that
> userspace wants. Although probably any name will still be somewhat
> confusing.
I'd prefer to keep 'cpu_caps' so that the name is aligned with the APIs, e.g. I
think having "effective" in the field but not e.g. guest_cpu_cap_set() would be
even more confusing.
Also, looking at this again, "effective" isn't strictly correct (my comment about
is also wrong), as virtualization holes that neither userspace nor KVM cares about
aren't reflected in CPUID caps. E.g. things like MOVBE and POPCNT have a CPUID
flag but no way to prevent the guest from using them.
So a truly accurate name would have to be something like
effective_cpu_caps_that_kvm_cares_about. :-)
> > u64 reserved_gpa_bits;
> > int maxphyaddr;
> > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> > index 4f464187b063..4bf3c2d4dc7c 100644
> > --- a/arch/x86/kvm/cpuid.c
> > +++ b/arch/x86/kvm/cpuid.c
> > @@ -327,9 +327,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
> > struct kvm_cpuid_entry2 *best;
> > bool allow_gbpages;
> >
> > - BUILD_BUG_ON(KVM_NR_GOVERNED_FEATURES > KVM_MAX_NR_GOVERNED_FEATURES);
> > - bitmap_zero(vcpu->arch.governed_features.enabled,
> > - KVM_MAX_NR_GOVERNED_FEATURES);
> > + memset(vcpu->arch.cpu_caps, 0, sizeof(vcpu->arch.cpu_caps));
> >
> > /*
> > * If TDP is enabled, let the guest use GBPAGES if they're supported in
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index 245416ffa34c..9f18c4395b71 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -255,12 +255,12 @@ static __always_inline bool kvm_is_governed_feature(unsigned int x86_feature)
> > }
> >
> > static __always_inline void guest_cpu_cap_set(struct kvm_vcpu *vcpu,
> > - unsigned int x86_feature)
> > + unsigned int x86_feature)
> > {
> > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> > + unsigned int x86_leaf = __feature_leaf(x86_feature);
> >
> > - __set_bit(kvm_governed_feature_index(x86_feature),
> > - vcpu->arch.governed_features.enabled);
> > + reverse_cpuid_check(x86_leaf);
> > + vcpu->arch.cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
> > }
> >
> > static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> > @@ -273,10 +273,10 @@ static __always_inline void guest_cpu_cap_check_and_set(struct kvm_vcpu *vcpu,
> > static __always_inline bool guest_cpu_cap_has(struct kvm_vcpu *vcpu,
> > unsigned int x86_feature)
> > {
> > - BUILD_BUG_ON(!kvm_is_governed_feature(x86_feature));
> > + unsigned int x86_leaf = __feature_leaf(x86_feature);
> >
> > - return test_bit(kvm_governed_feature_index(x86_feature),
> > - vcpu->arch.governed_features.enabled);
> > + reverse_cpuid_check(x86_leaf);
> > + return vcpu->arch.cpu_caps[x86_leaf] & __feature_bit(x86_feature);
> > }
>
> It might make sense to think about extracting the common code between
> kvm_cpu_cap* and guest_cpu_cap*.
>
> The whole notion of reverse cpuid, KVM only leaves, and other nice things
> that it has is already very confusing, but as I understand there is
> no better way of doing it.
> But there must be a way to avoid at least duplicating this logic.
Yeah, that thought crossed my mind too, but de-duplicating the interesting bits
would require macros, which I think would be a net negative for this code. I
could definitely be convinced otherwise though, I do love me some macros :-)
Something easy I can, should, and will do regardless is to move reverse_cpuid_check()
into __feature_leaf(). It's already in __feature_bit(), and that would cut down
the copy+paste at least a little bit even if we do/don't use macros.
> Also speaking of this logic, it would be nice to document it.
> E.g for 'kvm_only_cpuid_leafs' it would be nice to have an explanation
> for each entry on why it is needed.
As in, which bits KVM cares about? That's guaranteed to become stale, and the
high level answer is always "because KVM needs it, but the kernel does not". The
actual bits are kinda sorta documented by KVM_X86_FEATURE() usage, and any
conflicts with the kernel's cpufeatures.h should result in a build error due to
KVM trying to redefined a macro.
> Just curious: I wonder why Intel called them leaves?
> CPUID leaves are just table entries, I don't see any tree there.
LOL, I suppose a tree without branches can still have leaves?
> Finally isn't plural of "leaf" is "leaves"?
Heh, yes, "leaves" is the more standar plural form of "leaf". And looking at the
SDM, "leafs" is used mostly for GETSEC and ENCL{S,U} "leafs". I spent a lot of
my formative x86 years doing SMX stuff, and then way to much time on SGX, so I
guess "leafs" just stuck with me.