Re: [PATCH] x86/fpu/xstate: clear XSAVE features if DISABLED_MASK set

From: Jon Kohler
Date: Tue May 30 2023 - 23:05:45 EST




> On May 30, 2023, at 6:22 PM, Dave Hansen <dave.hansen@xxxxxxxxx> wrote:
>
> On 5/30/23 13:01, Jon Kohler wrote:
>> Respect DISABLED_MASK when clearing XSAVE features, such that features
>> that are disabled do not appear in the xfeatures mask.
>
> One sanity check that I'd suggest adopting is "How many other users in
> the code do this?" How many DISABLED_MASK_BIT_SET() users are there?

Good tip, thank you. Just cpu_feature_enabled(), though I felt that using
DISABLED_MASK_BIT_SET() really does capture *exactly* what I’m trying to
do here.

Happy to take suggestions, perhaps !cpu_feature_enabled(cid) instead?

Or, I did noodle with the idea of making a cpu_feature_disabled() as an
alias for DISABLED_MASK_BIT_SET(), but that felt like bloating the change
for little gain.

>
>> This is important for kvm_load_{guest|host}_xsave_state, which look
>> at host_xcr0 and will do an expensive xsetbv when the guest and host
>> do not match.
>
> Is that the only problem? kvm_load_guest_xsave_state() seems to have
> some #ifdef CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS code and I can't
> imagine that KVM guests can even use PKRU if this code is compiled out.
>
> Also, this will set XFEATURE_PKRU in xcr0 and go to the trouble of
> XSAVE/XRSTOR'ing it all over the place even for regular tasks.

Correct, KVM isn’t the only beneficiary here as you rightly pointed out. I’m
happy to clarify that in the commit msg if you’d like.

Also, ack on the ifdef’s, I added those myself way back when, this change is
an addendum to that one to nuke the xsetbv overhead.

>> A prime example if CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is disabled,
>> the guest OS will not see PKU masked; however, the guest will incur
>> xsetbv since the host mask will never match the guest, even though
>> DISABLED_MASK16 has DISABLE_PKU set.
>
> OK, so you care because you're seeing KVM go slow. You tracked it down
> to lots of XSETBV's? You said, "what the heck, why is it doing XSETBV
> so much?" and tracked it down to 'max_features' which we use to populate
> XCR0?

Yes and Yes, that is exactly how I arrived here. kvm_load_{guest|host}_xsave_state
is on the critical path for vmexit / vmentry. This overhead sticks out like a sore
thumb when looking at perf top or visualizing threads with a flamegraph if the guest,
for whatever reason, has a different xcr0 than the host. That is easy to do if guests
have PKU masked out.

>
>> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
>> index 0bab497c9436..211ef82b53e3 100644
>> --- a/arch/x86/kernel/fpu/xstate.c
>> +++ b/arch/x86/kernel/fpu/xstate.c
>> @@ -798,7 +798,8 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
>> unsigned short cid = xsave_cpuid_features[i];
>>
>> /* Careful: X86_FEATURE_FPU is 0! */
>> - if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid))
>> + if ((i != XFEATURE_FP && !cid) || !boot_cpu_has(cid) ||
>> + DISABLED_MASK_BIT_SET(cid))
>> fpu_kernel_cfg.max_features &= ~BIT_ULL(i);
>> }
>
> I _think_ I'd rather this just be cpu_feature_enabled(cid) rather than
> using DISABLED_MASK_BIT_SET() directly.
>
> But, I guess this probably also isn't a big deal for _most_ people. Any
> sane distro kernel will just set CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS
> since it's pretty widespread on modern CPUs and works across Intel and
> AMD now.

Ack, I’m using PKU as the key example here, but looking forward this is more of a
correctness thing than anything else. If for any reason, any xsave feature is disabled
In the way that PKU is disabled, it will slip thru the cracks.

If it would make it cleaner, I’m happy to drop the example from the commit msg to
prevent any confusion that this is PKU specific in any way.

Thoughts?