Re: [PATCH v4 22/22] x86/fpu/xstate: Introduce boot-parameters to control state component support

From: Thomas Gleixner
Date: Thu Mar 25 2021 - 21:51:35 EST


Len,

On Thu, Mar 25 2021 at 18:59, Len Brown wrote:
> On Sat, Mar 20, 2021 at 4:57 PM Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
>> We won't enable features which are unknown ever. Keep that presilicon
>> test gunk where it belongs: In the Intel poison cabinet along with the
>> rest of the code which nobody ever want's to see.
>
> I agree, it would be irresponsible to enable unvalidated features by default,
> and pre-silicon "test gunk" should be kept out of the upstream kernel.

Well, that's not my experience from the past and sorry for being
paranoid about that.

> This patch series is intended solely to enable fully validated
> hardware features, with product quality kernel support.

The fact, that the function is broken as provided, definitely supports
that product quality argument.

> The reason that the actual AMX feature isn't mentioned until the 16th
> patch in this series is because all of the patches before it are
> generic state save/restore patches, that are not actually specific to
> AMX.

That's related to 22/22 in which way?

> We call AMX a "simple state feature" -- it actually requires NO KERNEL
> ENABLING above the generic state save/restore to fully support
> userspace AMX applications.

Aside of the unanswered questions vs. the impact of letting it in
initialized state along with the unsolved problem of sigaltstacks...

> While not all ISA extensions can be simple state features, we do
> expect future features to share this trait, and so we want to be sure
> that it is simple to update the kernel to turn those features on (and
> when necessary, off).

History tells me a different story.

> There will be a future CPUID attribute that will help us identify
> future simple-state features.
> For AMX, of course, we simply know.

You believe so, but do you know for sure?

I neither know for sure nor do I believe any of this at all.

Please provide the architectural document which guarantees that and does
so in a way that it can be evaluated by the kernel. Have not seen that,
so it does not exist at all.

Future CPUID attributes are as useful as the tweet of today.

> So after the generic state management support, the kernel enabling of AMX
> is not actually required to run applications. Just like when a new instruction
> is added that re-uses existing state -- the application or library can check
> CPUID and just use it. It is a formality (perhaps an obsolete one), that
> we add every feature flag to /proc/cpuid for the "benefit" of
> userspace.

It's not a formality when the instruction requires kernel support and
from the history of the various incarnations of this command line option
it's just a given that this is going belly up.

Even the current incarnation is broken just from looking at it, so what
the heck are you talking about?

> The reason we propose this cmdline switch is
> 1. Ability of customers to disable a feature right away if an issue is found.
> Unlike the CPUid cmdline that works on flags, this is the ability to turn
> off a feature based on its state number. Ie. There could be 20 features
> that use the same state, and you can turn them all off at once this
> way.

I'm fine with that, but then the disabling has to handle all the things
related to it and not just be on a 'pray that it works' base.

> 2. Ability of customers to enable a feature that is disabled by default
> in their kernel. Yes, this will taint their kernel (thanks Andy),
> but we have customers that want to run the new feature on day 0
> before they have got a distro update to change the default, and this
> gives them a way to do that.

You might know my opinion from previous discussions about this topic,
but let me repeat it for completeness sake:

This is a generic kernel exposed to a gazillion of users and a
minority of them want to have the ability to enable insane
stuff on the command line because:

1) Intel is not able to provide them a test kernel package

2) Their favourite $DISTROVENDOR is not able to provide them a
test kernel package

3) Intel did not manage to get the support for this upstream
on time so the $DISTROVENDOR was able to backport it into
their Frankenkernel

So you seriously want us to have a command line option to enable
whatever the feature of today is because of #1-#3?

Sure, from a Intel managerial POV that's all cool. Not so much when
you put your community hat on and think about the consequences.

Aside of that none of the above #1 - #3 is a technical argument. See
Documentation/process/* for further enlightment.

Of course none of your arguments above have shown up in the changelog of
this command line patch. And none of the potential side effects or down
sides have been mentioned.

Don't blame Chang Bae for that. That patch carries a:

Reviewed-by: Len Brown <len.brown@xxxxxxxxx>

I really have to ask whether you actually looked at the code and the
changelog or just tagged it because some internal procedure requires it.

Either way ....

> Yeah, the cmdline syntax is not a user-friendly mnemonic, and I don't know
> that making it so would be an improvement.
> Like the CPUID cmdline, it is precise, it is future-proof, and it is
> used only in special situations.

The CPUID commandline option is yet another trainwreck which is neither
precise nor future proof if you dare to take a deep technical look. It
should have never been merged and it should be ripped out rather than
proliferated. If you think otherwise then please provide a proper proof
that this commandline option is correct under all circumstances before
abusing it as an argument.

Please try again when you have

- a reviewable and functional correct implementation

- including the ability to evalute that via architectural CPUID

- a changelog which provides an argument which is based on solely
technical criteria instead of wishful managerial thinking or being
just void of content like the current one.

Sorry for looking at this solely from the technical side and thereby
ignoring all the managerial powerpoint slide illusions.

Now putting my managerial hat on:

Given the history of that command line option, I have no idea why
this has even be tried to piggy pack on AMX at all. It's an
orthogonal problem and absolutely not required to make AMX supported
in the first place.

Hrm, unless you expect that a lot of users will need to disable AMX
because ... But that would be a technical reason not to enable it
in the first place, which is not desired from a managerial/marketing
POV, right?

Thanks,

tglx