Re: [patch 01/10] x86/fpu/signal: Clarify exception handling in restore_fpregs_from_user()

From: Thomas Gleixner
Date: Mon Aug 30 2021 - 20:34:23 EST


Linus,

On Mon, Aug 30 2021 at 15:00, Linus Torvalds wrote:
> But since the Intel machine check stuff is so misdesigned and doesn't
> work on any normal machines, most people can't test any of this, none
> of this matters, and it's only broken on those "serious enterprise
> machines" setups that people think are better, but are actually just
> almost entirely untested and thus don't work right.

what's worse is that even if you have access to such a machine, there is
no documented way to do proper hardware based error injection.

The injection mechanism which claims to do hardware error injection in
arch/x86/kernel/cpu/mce/inject.c is a farce:

All it does is to "prepare" the MSRs with some fake error values and
raising #MC via int 18 afterwards in the hope that the previously
prepared MSR values are still valid. Great way to test stuff by setting
the MSR to the expected failure value and then raising the exception in
software.

NHM had a documented mechanism to inject at least ECC failures at the
hardware level, but with the later memory controllers this ended up in
the documentation black hole along with all the other undocumented real
HW injection mechanisms which allow actual testing of this stuff.

The HW injection mechanisms definitely exist, but without documentation
they are useless. Intel still thinks that the secrecy around that stuff
is valuable and they can get away with those untestable mechanisms even
for their endeavours in the safety critical space.

It's pretty much the same approach as security through obscurity, but in
the safety case that's even more hillarious.

Though we all know what the 'S' in INTEL stands for... I used to be
Security, but nowadays it's Security _and_ Safety.

Thanks,

tglx