Re: [PATCH] KVM: x86: Add Intel CPUID.1F cpuid emulation support

From: Sean Christopherson
Date: Mon Apr 22 2019 - 14:41:05 EST


On Mon, Apr 22, 2019 at 02:40:34PM +0800, Like Xu wrote:
> Expose Intel V2 Extended Topology Enumeration Leaf to guest only when
> host system has multiple software-visible die within each package.
>
> Signed-off-by: Like Xu <like.xu@xxxxxxxxxxxxxxx>
> ---
> arch/x86/kvm/cpuid.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index fd39516..9fc14f2 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -65,6 +65,16 @@ u64 kvm_supported_xcr0(void)
> return xcr0;
> }
>
> +/* We need to check if the host cpu has multi-chip packaging technology. */
> +static bool kvm_supported_intel_mcp(void)
> +{
> + u32 eax, ignored;
> +
> + cpuid_count(0x1f, 0, &eax, &ignored, &ignored, &ignored);
> +
> + return boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && (eax != 0);
> +}
> +
> #define F(x) bit(X86_FEATURE_##x)
>
> int kvm_update_cpuid(struct kvm_vcpu *vcpu)
> @@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> switch (function) {
> case 0:
> entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
> + entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;

This all seems unnecessary. And by 'all', I mean the existing Intel PT
and XSAVE leaf checks, as well as the new mcp check. entry->eax comes
directly from hardware, and unless I missed something, PT and XSAVE are
only exposed to the guest when they're supported in hardware. In other
words, KVM will never need to adjust entry->eax to expose PT or XSAVE.

The original min() check was added by commit 0771671749b5 ("KVM: Enhance
guest cpuid management"), which doesn't provide any explicit information
on why KVM does min() in the first place. Given that the original code
was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
idea was to always report "Extended Topology Enumeration Leaf" as
supported so that userspace can enumerate the VM's topology to the guest
even when hardware itself doesn't do so.

Assuming we want to allow userspace to use "V2 Extended Topology
Enumeration Leaf" regardless of hardware support, then this can simply be:

entry->eax = min(entry->eax, (u32)0xf);

Or am I completely missing something?

> break;
> case 1:
> entry->edx &= kvm_cpuid_1_edx_x86_features;
> @@ -544,6 +555,8 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
> entry->edx = edx.full;
> break;
> }
> + /* function 0x1f has additional index. */

The original comment is rather useless, it's obvious from the code that
it has additional indices. No need to repeat its sins. A more useful
comment would be to explain that 0x1f and 0xb have identical formats and
thus can be handled by common code.

Which begs the question, why does leaf 0x1f exist? AFAICT the only
difference is that 0x1f supports additional "level types", but 0x1f's
types are backwards compatibile. Any idea why leaf 0xb wasn't simply
extended for the new types?

> + case 0x1f:
> /* function 0xb has additional index. */
> case 0xb: {
> int i, level_type;
> --
> 1.8.3.1
>