Re: [PATCH v4 14/22] x86/fpu/xstate: Expand the xstate buffer on the first use of dynamic user state

From: Len Brown
Date: Mon Mar 29 2021 - 09:15:36 EST


On Sat, Mar 20, 2021 at 6:14 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> On Sun, Feb 21 2021 at 10:56, Chang S. Bae wrote:
> > +
> > +/* Update MSR IA32_XFD with xfirstuse_not_detected() if needed. */
> > +static inline void xdisable_switch(struct fpu *prev, struct fpu *next)
> > +{
> > + if (!static_cpu_has(X86_FEATURE_XFD) || !xfirstuse_enabled())
> > + return;
> > +
> > + if (unlikely(prev->state_mask != next->state_mask))
> > + xdisable_setbits(xfirstuse_not_detected(next));
> > +}
>
> So this is invoked on context switch. Toggling bit 18 of MSR_IA32_XFD
> when it does not match. The spec document says:
>
> "System software may disable use of Intel AMX by clearing XCR0[18:17], by
> clearing CR4.OSXSAVE, or by setting IA32_XFD[18]. It is recommended that
> system software initialize AMX state (e.g., by executing TILERELEASE)
> before doing so. This is because maintaining AMX state in a
> non-initialized state may have negative power and performance
> implications."
>
> I'm not seeing anything related to this. Is this a recommendation
> which can be ignored or is that going to be duct taped into the code
> base once the first user complains about slowdowns of their non AMX
> workloads on that machine?

I found the author of this passage, and he agreed to revise it to say this
was targeted primarily at VMMs.

"negative power and performance implications" refers to the fact that
the processor will not enter C6 when AMX INIT=0, instead it will demote
to the next shallower C-state, eg C1E.

(this is because the C6 flow doesn't save the AMX registers)

For customers that have C6 enabled, the inability of a core to enter C6
may impact the maximum turbo frequency of other cores.

thanks,
-Len Brown
Intel Open Source Technology Center