Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state

From: Thomas Gleixner
Date: Mon Mar 29 2021 - 09:34:16 EST


On Mon, Mar 29 2021 at 09:14, Len Brown wrote:
> On Sat, Mar 20, 2021 at 6:14 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>>
>> On Sun, Feb 21 2021 at 10:56, Chang S. Bae wrote:
>> > +
>> > +/* Update MSR IA32_XFD with xfirstuse_not_detected() if needed. */
>> > +static inline void xdisable_switch(struct fpu *prev, struct fpu *next)
>> > +{
>> > + if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled())
>> > + return;
>> > +
>> > + if (unlikely(prev->state_mask != next->state_mask))
>> > + xdisable_setbits(xfirstuse_not_detected(next));
>> > +}
>>
>> So this is invoked on context switch. Toggling bit 18 of MSR_IA32_XFD
>> when it does not match. The spec document says:
>>
>> "System software may disable use of Intel AMX by clearing XCR0[18:17], by
>> clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. It is recommended that
>> system software initialize AMX state (e.g., by executing TILERELEASE)
>> before doing so. This is because maintaining AMX state in a
>> non-initialized state may have negative power and performance
>> implications."
>>
>> I'm not seeing anything related to this. Is this a recommendation
>> which can be ignored or is that going to be duct taped into the code
>> base once the first user complains about slowdowns of their non AMX
>> workloads on that machine?
>
> I found the author of this passage, and he agreed to revise it to say this
> was targeted primarily at VMMs.

Why would this only a problem for VMMs?

> "negative power and performance implications" refers to the fact that
> the processor will not enter C6 when AMX INIT=0, instead it will demote
> to the next shallower C-state, eg C1E.
>
> (this is because the C6 flow doesn't save the AMX registers)
>
> For customers that have C6 enabled, the inability of a core to enter C6
> may impact the maximum turbo frequency of other cores.

That's the same on bare metal, right?

Thanks,

tglx