Re: [PATCH 4/4] x86/tsx: Add cmdline tsx=fake to not clear CPUID bits RTM and HLE

From: Paolo Bonzini
Date: Thu Jul 08 2021 - 05:16:19 EST


On 07/07/21 20:23, Eduardo Habkost wrote:
On Wed, Jul 7, 2021 at 1:18 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote:

On Wed, Jul 7, 2021 at 10:08 AM Eduardo Habkost <ehabkost@xxxxxxxxxx> wrote:

On Wed, Jul 7, 2021 at 12:42 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote:

On Wed, Jul 7, 2021 at 8:09 AM Eduardo Habkost <ehabkost@xxxxxxxxxx> wrote:

CCing libvir-list, Jiri Denemark, Michal Privoznik, so they are aware
that the definition of "supported CPU features" will probably become a
bit more complex in the future.

Has there ever been a clear definition? Family, model, and stepping,
for instance: are these the only values supported? That would make
cross-platform migration impossible. What about the vendor string? Is
that the only value supported? That would make cross-vendor migration
impossible. For the maximum input value for basic CPUID information
(CPUID.0H:EAX), is that the only value supported, or is it the maximum
value supported? On the various individual feature bits, does a '1'
imply that '0' is also supported, or is '1' the only value supported?
What about the feature bits with reversed polarity (e.g.
CPUID.(EAX=07H,ECX=0):EBX.FDP_EXCPTN_ONLY[bit 6])?

This API has never made sense to me. I have no idea how to interpret
what it is telling me.

Is this about GET_SUPPORTED_CPUID, QEMU's query-cpu-model-expansion &
related commands, or the libvirt CPU APIs?

This is my ongoing rant about KVM_GET_SUPPORTED_CPUID.


I agree the definition is not clear. I have tried to enumerate below
what QEMU assumes about the return value of KVM_GET_SUPPORTED_CPUID.
These are a collection of workarounds and feature-specific rules that
are encoded in the kvm_arch_get_supported_cpuid()
x86_cpu_filter_features(), and cpu_x86_cpuid() functions in QEMU.

1. Passing through the returned values (unchanged) from
KVM_GET_SUPPORTED_CPUID to KVM_SET_CPUID is assumed to be always safe,
as long as the ability to save/resume VCPU state is not required.
(This is the behavior implemented by "-cpu host,migratable=off")

Right, this is basically the definition of KVM_GET_SUPPORTED_CPUID.

2. The safety of setting a bit to a different value requires specific
knowledge about the CPUID bit.
2.1. For a specific set of registers (see below), QEMU assumes it's
safe to set the bit to 0 when KVM_GET_SUPPORTED_CPUID returns 1.
2.2. For a few specific leaves (see below), there are more complex rules.
2.4. For all other leaves, QEMU doesn't use the return value of
KVM_GET_SUPPORTED_CPUID at all (AFAICS).


The CPUID leaves mentioned in 2.1 are:

CPUID[1].EDX
CPUID[1].ECX
CPUID[6].EAX
CPUID[EAX=7,ECX=0].EBX
- This unfortunately includes de-feature bits like FDP_EXCPTN_ONLY and
ZERO_FCS_FDS
CPUID[EAX=7,ECX=0].ECX
CPUID[EAX=7,ECX=0].EDX
CPUID[EAX=7,ECX=1].EAX
CPUID[EAX=0Dh,ECX=0].EAX
CPUID[EAX=0Dh,ECX=0].EDX
CPUID[EAX=0Dh,ECX=1].EAX
- Note that CPUID[0Dh] has additional logic to ensure XSAVE component
info on CPUID is consistent
CPUID[40000001h].EAX
CPUID[40000001h].EDX
CPUID[80000001h].EDX
CPUID[80000001h].ECX
CPUID[80000007h].EDX
CPUID[80000008h].EBX
CPUID[8000000Ah].EDX
CPUID[C0000001h].EDX

Plus all unknown leaves.


Some of the CPUID leaves mentioned in 2.2 are:

CPUID[1].ECX.HYPERVISOR[bit 31]
- Can be enabled unconditionally
CPUID[1].ECX.TSC_DEADLINE_TIMER[bit 24]
- Can be set to 1 if using the in-kernel irqchip and
KVM_CAP_TSC_DEADLINE_TIMER is enabled
CPUID[1].ECX.X2APIC[bit 21]
- Can be set to 1 if using the in-kernel irqchip
CPUID[1].ECX.MONITOR[bit 3]
- Can be set to 1 if KVM_X86_DISABLE_EXITS_MWAIT is enabled

Can always be set to 1, but only makes sense to do so if KVM_X86_DISABLE_EXITS_MWAIT is enabled.

CPUID[6].EAX.ARAT[bit 2]
- Can be enabled unconditionally
CPUID[EAX=7,ECX=0].EDX.ARCH_CAPABILITIES
- Workaround for KVM bug in Linux v4.17-v4.20
CPUID[EAX=14h,ECX=0], CPUID{EAX=14h,ECX=1]
- Most bits must match the host, unless
CPUID[EAX=7,ECX=0].EBX.INTEL_PT[bit 25] is 0
CPUID[80000001h].EDX
- AMD-specific feature flag aliases can be set based on CPUID[1].EDX
CPUID[40000001h].EAX
- KVM_FEATURE_PV_UNHALT requires in-kernel irqchip
- KVM_FEATURE_MSI_EXT_DEST_ID requires split irqchip
CPUID[40000001].EDX.KVM_HINTS_REALTIME
- Can be enabled unconditionally

This should apply to all of CPUID[4000_0001h].EDX in the future

Thanks Eduardo, this is a great start for kernel-side documentation! I'll wrap it in a patch.

Paolo