Re: [PATCH 47/61] KVM: x86: Squash CPUID 0x2.0 insanity for modern CPUs

From: Vitaly Kuznetsov
Date: Mon Feb 24 2020 - 17:35:09 EST

Next message: Paul Moore: "Re: kernel panic: audit: rate limit exceeded"
Previous message: Jens Axboe: "Re: [PATCH v4 0/3] io_uring: add splice(2) support"
Next in thread: Paolo Bonzini: "Re: [PATCH 47/61] KVM: x86: Squash CPUID 0x2.0 insanity for modern CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> writes:

> Rework CPUID 0x2.0 to be a normal CPUID leaf if it returns "01" in AL,
> i.e. EAX & 0xff.
>
> Long ago, Intel documented CPUID 0x2.0 as being a stateful leaf, e.g. a
> version of the SDM circa 1995 states:
>
> The least-significant byte in register EAX (register AL) indicates the
> number of times the CPUID instruction must be executed with an input
> value of 2 to get a complete description of the processors's caches
> and TLBs. The Pentium Pro family of processors will return a 1.
>
> A 2000 version of the SDM only updated the paragraph to reference
> Intel's new processory family:
>
> The first member of the family of Pentium 4 processors will return a 1.
>
> Fast forward to the present, and Intel's SDM now states:
>
> The least-significant byte in register EAX (register AL) will always
> return 01H. Software should ignore this value and not interpret it as
> an information descriptor.
>
> AMD's APM simply states that CPUID 0x2 is reserved.
>
> Given that CPUID itself was introduced in the Pentium, odds are good
> that the only Intel CPU family that *maybe* implemented a stateful CPUID
> was the P5. Which obviously did not support VMX, or KVM.
>
> In other words, KVM's emulation of a stateful CPUID 0x2.0 has likely
> been dead code from the day it was introduced. This is backed up by
> commit 0fdf8e59faa5c ("KVM: Fix cpuid iteration on multiple leaves per
> eac"), whichs show that the stateful iteration code was completely
> broken when it was introduced by commit 0771671749b59 ("KVM: Enhance
> guest cpuid management"), i.e. not actually tested.
>
> Although it's _extremely_ tempting to yank KVM's stateful code, leave it
> in for now but annotate all its code paths as "unlikely". The code is
> relatively contained, and if by some miracle there is someone running KVM
> on a CPU with a stateful CPUID 0x2, more power to 'em.
>
> Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> ---
> arch/x86/kvm/cpuid.c | 31 +++++++++++++++++++++----------
> 1 file changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
> index 47f61f4497fb..ab2a34337588 100644
> --- a/arch/x86/kvm/cpuid.c
> +++ b/arch/x86/kvm/cpuid.c
> @@ -405,9 +405,6 @@ static struct kvm_cpuid_entry2 *do_host_cpuid(struct kvm_cpuid_array *array,
> &entry->eax, &entry->ebx, &entry->ecx, &entry->edx);
>
> switch (function) {
> - case 2:
> - entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
> - break;
> case 4:
> case 7:
> case 0xb:
> @@ -483,17 +480,31 @@ static inline int __do_cpuid_func(struct kvm_cpuid_array *array, u32 function)
> * it since we emulate x2apic in software */
> cpuid_entry_set(entry, X86_FEATURE_X2APIC);
> break;
> - /* function 2 entries are STATEFUL. That is, repeated cpuid commands
> - * may return different values. This forces us to get_cpu() before
> - * issuing the first command, and also to emulate this annoying behavior
> - * in kvm_emulate_cpuid() using KVM_CPUID_FLAG_STATE_READ_NEXT */
> case 2:
> + /*
> + * On ancient CPUs, function 2 entries are STATEFUL. That is,
> + * CPUID(function=2, index=0) may return different results each
> + * time, with the least-significant byte in EAX enumerating the
> + * number of times software should do CPUID(2, 0).
> + *
> + * Modern CPUs (quite likely every CPU KVM has *ever* run on)
> + * are less idiotic. Intel's SDM states that EAX & 0xff "will
> + * always return 01H. Software should ignore this value and not
> + * interpret it as an informational descriptor", while AMD's
> + * APM states that CPUID(2) is reserved.
> + */
> + max_idx = entry->eax & 0xff;
> + if (likely(max_idx <= 1))
> + break;
> +
> + entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
> entry->flags |= KVM_CPUID_FLAG_STATE_READ_NEXT;
>
> - for (i = 1, max_idx = entry->eax & 0xff; i < max_idx; ++i) {
> + for (i = 1; i < max_idx; ++i) {
> entry = do_host_cpuid(array, 2, 0);
> if (!entry)
> goto out;
> + entry->flags |= KVM_CPUID_FLAG_STATEFUL_FUNC;
> }
> break;
> /* functions 4 and 0x8000001d have additional index. */
> @@ -903,7 +914,7 @@ static int is_matching_cpuid_entry(struct kvm_cpuid_entry2 *e,
> return 0;
> if ((e->flags & KVM_CPUID_FLAG_SIGNIFCANT_INDEX) && e->index != index)
> return 0;
> - if ((e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) &&
> + if (unlikely(e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC) &&
> !(e->flags & KVM_CPUID_FLAG_STATE_READ_NEXT))
> return 0;
> return 1;
> @@ -920,7 +931,7 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
>
> e = &vcpu->arch.cpuid_entries[i];
> if (is_matching_cpuid_entry(e, function, index)) {
> - if (e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC)
> + if (unlikely(e->flags & KVM_CPUID_FLAG_STATEFUL_FUNC))
> move_to_next_stateful_cpuid_entry(vcpu, i);
> best = e;
> break;

Reviewed-by: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>

but your history digging results make me think that killing the whole
'statefulness' thing is not a bad idea at all :-)

--
Vitaly

Next message: Paul Moore: "Re: kernel panic: audit: rate limit exceeded"
Previous message: Jens Axboe: "Re: [PATCH v4 0/3] io_uring: add splice(2) support"
Next in thread: Paolo Bonzini: "Re: [PATCH 47/61] KVM: x86: Squash CPUID 0x2.0 insanity for modern CPUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]