Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set

From: Sean Christopherson
Date: Wed May 31 2023 - 18:02:35 EST


On Wed, May 31, 2023, Borislav Petkov wrote:
> On Wed, May 31, 2023 at 08:18:34PM +0000, Sean Christopherson wrote:
> > Assert that the to-be-checked bit passed to cpu_feature_enabled() is a
> > compile-time constant instead of applying the DISABLED_MASK_BIT_SET()
> > logic if and only if the bit is a constant. Conditioning the check on
> > the bit being constant instead of requiring the bit to be constant could
> > result in compiler specific kernel behavior, e.g. running on hardware that
> > supports a disabled feature would return %false if the compiler resolved
> > the bit to a constant, but %true if not.
>
> Uff, more mirroring CPUID inconsistencies.
>
> So *actually*, we should clear all those build-time disabled bits from
> x86_capability so that this doesn't happen.

Heh, I almost suggested that, but there is a non-zero amount of code that wants
to ignore the disabled bits and query the "raw" CPUID information. In quotes
because the kernel still massages x86_capability. Changing that behavior will
require auditing a lot of code, because in most cases any breakage will be mostly
silent, e.g. loss of features/performance and not explosions.

E.g. KVM emulates UMIP when it's not supported in hardware, and so advertises UMIP
support irrespective of hardware/host support. But emulating UMIP is imperfect
and suboptimal (requires intercepting L*DT instructions), so KVM intercepts L*DT
instructions iff UMIP is not supported in hardware, as detected by
boot_cpu_has(X86_FEATURE_UMIP).

The comment for cpu_feature_enabled() even calls out this type of use case:

Use the cpu_has() family if you want true runtime testing of CPU features, like
in hypervisor code where you are supporting a possible guest feature where host
support for it is not relevant.

That said, the behavior of cpu_has() is wildly inconsistent, e.g. LA57 is
indirectly cleared in x86_capability if it's a disabled bit because of this code
in early_identify_cpu().

if (!pgtable_l5_enabled())
setup_clear_cpu_cap(X86_FEATURE_LA57);

KVM works around that by manually doing CPUID to query hardware directly:

/* Set LA57 based on hardware capability. */
if (cpuid_ecx(7) & F(LA57))
kvm_cpu_cap_set(X86_FEATURE_LA57);

So yeah, I 100% agree the current state is messy and would love to have
cpu_feature_enabled() be a pure optimization with respect to boot_cpu_has(), but
it's not as trivial at it looks.