Re: [PATCH v4 22/22] x86/fpu/xstate: Introduce boot-parameters to control state component support

From: Len Brown
Date: Fri Mar 26 2021 - 13:54:53 EST


On Fri, Mar 26, 2021 at 11:48 AM Andy Lutomirski <luto@xxxxxxxxxx> wrote:

> > I submit, that after the generic XFD support is in place,
> > there is exactly 1 bit that needs to be flipped to enable
> > user applications to benefit from AMX.
>
> The TILERELEASE opcode itself is rather longer than one bit, and the
> supporting code to invoke it at the right time, to avoid corrupting
> user state, and avoid causing performance regressions merely by
> existing will be orders of magnitude more than 1 bit. Of course, all
> of this is zero bits in the current series because the code is
> missing.entirely.

Please explain why the kernel must know about the TILERELEASE
instruction in order for an AMX application to run properly.

> This isn't just about validation. There's also ABI, performance, and
> correctness.

Thank you for agreeing that this is not about unvalidated features.

> ABI: The AVX-512 enablement *already* broke user ABI. Sadly no one
> told anyone in the kernel community until about 5 years after the
> fact, and it's a bit late to revert AVX-512. But we don't want to
> enable AMX until the ABI has a reasonable chance of being settled.
> Ditto for future features. As it stands, if you xstate.enable some
> 16MB feature, the system may well simply fail to boot as too many user
> processes explode.

At Dave's suggestion, we had a 64 *KB* sanity check on this path.
Boris forced us to remove it, because we could not tell him
how we chose the number 64.

I would be delighted to see a check for 64 KB restored, and that
it be a rejection, rather than warning. At this point, as there is no way
go down that path without manually modifying the kernel, it would
devolve into a sanity check for a hardware (CPUID) bug.

> Performance:
>
> We *still* don't know the performance implications of leaving the AMX
> features in use inappropriately. Does it completely destroy idle?

No.

> Will it literally operate CPUs out of spec such that Intel's
> reliability estimates will be invalidated?

No.

> (We had that with NVMe APST. Let's not repeat this with XSTATE.)

I acknowledge that the possibility of broken hardware always exists.
However, I don't see how the experience with broken NVMe actually applies here,
other than general paranoia about new features (which is, arguably, healthy).

> The performance impacts
> and transitions for AVX-512 are, to put it charitably, forthcoming.

I acknowledge the parallels with AVX-512, in that AMX adds new instructions,
and it has even bigger registers. I also acknowledge that the AVX-512 rollout
(and arguably, its brief existence on client CPUs) was problematic.

My understanding is that Intel continues to learn (a lot) from its mistakes.
I believe that the AVX-512 credits problem has been largely eliminated
on newer Xeons.

My understanding is that AMX is implemented only in CPUs that actually
have the hardware to properly support AMX. If it were not, then that would
be a problem for Intel to deal with in hardware, not a problem for Linux
to deal with in software.

> Correctness: PKRU via the kernel's normal XSAVE path would simply be
> incorrect. Do we really trust that this won't get repeated? Also,
> frankly, a command line option that may well break lots of userspace
> but that we fully expect Intel to recommend setting is not a good
> thing.

There is no analogy between AMX and PKRU, except the fact that they
are both features, and at one time, both were new.

I am unaware of anybody at Intel recommending that any cmdline
be set that would break userspace.

thanks,
Len Brown, Intel Open Source Technology Center