Re: [PATCH v4 22/22] x86/fpu/xstate: Introduce boot-parameters to control state component support

From: Andy Lutomirski
Date: Fri Mar 26 2021 - 14:13:43 EST


On Fri, Mar 26, 2021 at 10:54 AM Len Brown <lenb@xxxxxxxxxx> wrote:
>
> On Fri, Mar 26, 2021 at 11:48 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>
> > > I submit, that after the generic XFD support is in place,
> > > there is exactly 1 bit that needs to be flipped to enable
> > > user applications to benefit from AMX.
> >
> > The TILERELEASE opcode itself is rather longer than one bit, and the
> > supporting code to invoke it at the right time, to avoid corrupting
> > user state, and avoid causing performance regressions merely by
> > existing will be orders of magnitude more than 1 bit. Of course, all
> > of this is zero bits in the current series because the code is
> > missing.entirely.
>
> Please explain why the kernel must know about the TILERELEASE
> instruction in order for an AMX application to run properly.

I'm just repeating things already said, and this is getting
ridiculous. TILERELEASE isn't needed for an AMX application to run
properly -- it's needed for the rest of the system to run properly, at
least according to Intel's published docs. Quoting the current ISE
document:

3.3 RECOMMENDATIONS FOR SYSTEM SOFTWARE

System software may disable use of Intel AMX by clearing XCR0[18:17],
by clearing CR4.OSXSAVE, or by setting
IA32_XFD[18]. It is recommended that system software initialize AMX
state (e.g., by executing TILERELEASE)
before doing so. This is because maintaining AMX state in a
non-initialized state may have negative power and
performance implications.

Since you reviewed the patch set, I assume you are familiar with how
Linux manages XSTATE. Linux does *not* eagerly load XSTATE on context
switch. Instead, Linux loads XSTATE when the kernel needs it loaded
or before executing user code. This means that the kernel can (and
does, and it's a performance win) execute kernel thread code and/or go
idle, *including long-term deep idle*, with user XSTATE loaded.


>
> > This isn't just about validation. There's also ABI, performance, and
> > correctness.
>
> Thank you for agreeing that this is not about unvalidated features.
>
> > ABI: The AVX-512 enablement *already* broke user ABI. Sadly no one
> > told anyone in the kernel community until about 5 years after the
> > fact, and it's a bit late to revert AVX-512. But we don't want to
> > enable AMX until the ABI has a reasonable chance of being settled.
> > Ditto for future features. As it stands, if you xstate.enable some
> > 16MB feature, the system may well simply fail to boot as too many user
> > processes explode.
>
> At Dave's suggestion, we had a 64 *KB* sanity check on this path.
> Boris forced us to remove it, because we could not tell him
> how we chose the number 64.
>
> I would be delighted to see a check for 64 KB restored, and that
> it be a rejection, rather than warning. At this point, as there is no way
> go down that path without manually modifying the kernel, it would
> devolve into a sanity check for a hardware (CPUID) bug.

This is nuts. The ABI is ALREADY BROKEN. How does picking a random
number quantifying additional breakage help? We do not have a good
design for AVX-512 in Linux, we don't have a good design for AMX in
Linux, and we absolutely don't have a good design for the secret
feature we don't know about yet in Linux.