On Tue, Apr 23, 2019 at 11:23:59AM +0800, Like Xu wrote:
On 2019/4/23 2:35, Sean Christopherson wrote:
#define F(x) bit(X86_FEATURE_##x)
int kvm_update_cpuid(struct kvm_vcpu *vcpu)
@@ -426,6 +436,7 @@ static inline int __do_cpuid_ent(struct kvm_cpuid_entry2 *entry, u32 function,
switch (function) {
case 0:
entry->eax = min(entry->eax, (u32)(f_intel_pt ? 0x14 : 0xd));
+ entry->eax = kvm_supported_intel_mcp() ? 0x1f : entry->eax;
This all seems unnecessary. And by 'all', I mean the existing Intel PT
and XSAVE leaf checks, as well as the new mcp check. entry->eax comes
directly from hardware, and unless I missed something, PT and XSAVE are
only exposed to the guest when they're supported in hardware. In other
words, KVM will never need to adjust entry->eax to expose PT or XSAVE.
We call this function for both case KVM_GET_SUPPORTED_CPUID and
KVM_GET_EMULATED_CPUID although kvm user could reconfig them via
KVM_SET_CPUID* path.
Not that it matters, but __do_cpuid_ent() is only used for the non-emulated
case, KVM_GET_EMULATED_CPUID goes to __do_cpuid_ent_emulated().
The original min() check was added by commit 0771671749b5 ("KVM: Enhance
guest cpuid management"), which doesn't provide any explicit information
on why KVM does min() in the first place.
Exposing cpuid.0.eax in a blind way (with host hardware support)
is not a good practice for guest migration and improves compatibility
requirements.
Right, but isn't the f_intel_pt check for example completely irrelevant?
f_intel_pt is true if and only if hardware supports PT, i.e. CPUID.0.EAX
and thus entry->eax will already be >=0x14.
I don't fully understand whether or not KVM needs to raise the minimum to
0xb regardless of h/w XSAVE support, but it's likely irrelevant in the end.
Anyways, back to 0x1f, kvm_supported_intel_mcp() returns true if and only
if hardware's CPUID.0.EAX >= 0x1f,
i.e. adjusting entry->eax is always a
nop. So if KVM wants to advertise leaf 0x1f only when it's supported in
hardware then adjusting entry->eax is unnecessary, and if KVM wants to
unconditionally advertise 0x1f then adjusting entry->eax should also be
done unconditionally.
Given that the original code
was "entry->eax = min(entry->eax, (u32)0xb);", my *guess* is that the
idea was to always report "Extended Topology Enumeration Leaf" as
supported so that userspace can enumerate the VM's topology to the guest
even when hardware itself doesn't do so.
If the host cpu mode is too antiquated to support 0xb, it wouldn't report
0xb for sure. The host cpuid.0.eax has been over 0xb for a long time and
reached 0x1f in the latest SDM.
AFAICT, the original code keeps minimum cpuid.0.eax out of features guest
just used or at least it claimed to use.
Assuming we want to allow userspace to use "V2 Extended Topology
Enumeration Leaf" regardless of hardware support, then this can simply be:
entry->eax = min(entry->eax, (u32)0x1f);
Or am I completely missing something?