Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set

From: Dave Hansen
Date: Tue May 30 2023 - 18:22:23 EST


On 5/30/23 13:01, Jon Kohler wrote:
> Respect DISABLED_MASK when clearing XSAVE features, such that features
> that are disabled do not appear in the xfeatures mask.

One sanity check that I'd suggest adopting is "How many other users in
the code do this?" How many DISABLED_MASK_BIT_SET() users are there?

> This is important for kvm_load_{guest|host}_xsave_state, which look
> at host_xcr0 and will do an expensive xsetbv when the guest and host
> do not match.

Is that the only problem? kvm_load_guest_xsave_state() seems to have
some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't
imagine that KVM guests can even use PKRU if this code is compiled out.

Also, this will set XFEATURE_PKRU in xcr0 and go to the trouble of
XSAVE/XRSTOR'ing it all over the place even for regular tasks.

> A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled,
> the guest OS will not see PKU masked; however, the guest will incur
> xsetbv since the host mask will never match the guest, even though
> DISABLED_MASK16 has DISABLE_PKU set.

OK, so you care because you're seeing KVM go slow. You tracked it down
to lots of XSETBV's? You said, "what the heck, why is it doing XSETBV
so much?" and tracked it down to 'max_features' which we use to populate
XCR0?

> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index 0bab497c9436..211ef82b53e3 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
> unsigned short cid = xsave_cpuid_features[i];
>
> /* Careful: X86_FEATURE_FPU is 0! */
> - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
> + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) ||
> + DISABLED_MASK_BIT_SET(cid))
> fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
> }

I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than
using DISABLED_MASK_BIT_SET() directly.

But, I guess this probably also isn't a big deal for _most_ people. Any
sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
since it's pretty widespread on modern CPUs and works across Intel and
AMD now.