Re: [PATCH] x86/memcpy: Introduce memcpy_mcsafe_fast
From: Linus Torvalds
Date: Mon Apr 20 2020 - 15:06:11 EST
On Mon, Apr 20, 2020 at 11:20 AM Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
>
> * I'm at a loss of why you seem to be suggesting that hardware should
> / could avoid all exceptions. What else could hardware do besides
> throw an exception on consumption of a naturally occuring multi-bit
> ECC error? Data is gone, and only software might know how to recover.
This is classic bogus thinking.
If Intel ever makes ECC DRAM available to everybody, there would be a
_shred_ of logic to that thinking, but right now it's some hw designer
in their mom's basement that has told you that hardware has to throw a
synchronous exception because hardware doesn't know any better.
That hardware designer really doesn't have a _clue_ about the big issues.
The fact is, a synchronous machine check exception is about the
_worst_ thing you can ever do when you encounter something like a
memory error.
It literally means that the software cannot possibly do anything sane
to recover, because the software is in some random place. The hardware
designer didn't think about the fact that the low-level access is
hidden from the software by a compiler and/or a lot of other
infrastructure - maybe microcode, maybe the OS, maybe a scriping
language, yadda yadda.
Absolutely NOBODY can recover at the level of one instruction. The
microcode people already proved that. At the level of "memcpy()", you
do not have a recovery option.
A hardware designer that tells you that you have to magically recover
at an instruction boundary fundamentally DOES NOT UNDERSTAND THE
PROBLEM.
IOW, you have completely uncritically just taken that incorrect
statement of "what else could hardware do" without questioning that
hardware designer AT ALL.
And remember, this is likely the same hardware designer that told you
that it's a good idea to then make machine checks go to every single
CPU in the system.
And this is the same hardware designer that then didn't even give you
enough information to recover.
And this is the same hardware designer that made recovery impossible
anyway, because if the error happened in microcode or in some other
situation, the microcode COULDN'T HANDLE IT EITHER!
In other words, you are talking to people WHO ARE KNOWN TO BE INCOMPETENT.
Seriously. Question them. When they tell ytou that "it's the only
thing we can possibly do", they do so from being incompetent, and we
have the history to PROVE it.
I don't understand why nobody else seems to be pushing back against
the completely insane and known garbage that is the Intel machine
checks. They are wrong.
The fact is, synchronous errors are the absolute worst possible
interface, exactly because they cause problems in various nasty corner
cases.
We _know_ a lot of those corner cases, for chrissake:
- random standard library routine like "memcpy". How the hell do you
think a memcpy can recover? It can't.
- Unaligned handling where "one" access isn't actually a single access.
- microcode. Intel saw this problem themselves, but instead of making
people realize "oh, synchronous exceptions don't work that well" and
think about the problem, they wasted our time for decades, and then
probably spent a lot of effort in trying to make them work.
- random generic code that isn't able to handle the fault, because IT
SHOULDN'T NEED TO CARE. Low-level filesystems, user mappings, the list
just goes on.
The only thing that can recover tends to be at a *MUCH* higher level
than one instruction access.
So the next time somebody tells you "there's nothing else we can do",
call them out on being incompetent, and call them out on the fact that
history has _shown_ that they are incompetent and wrong. Over and over
again.
I can _trivially_ point to a number of "what else could we do" that
are much better options.
(a) let the corrupted value through, notify things ASYNCHRONOUSLY
that there were problems, and let people handle the issue later when
they are ready to do so.
Yeah, the data was corrupt - but not allowing the user to read it may
mean that the whole file is now inaccessible, even if it was just a
single ECC block that was wrong. I don't know the block-size people
use for ECC, and the fact is, software shouldn't really even need to
care. I may be able to recover from the situation at a higher level.
The data may be recoverable other ways - including just a "I want even
corrupted data, because I might have enough context that I can make
sense of it anyway".
(b) if you have other issues so that you don't have data at all
(maybe it was metadata that was corrupted, don't ask me how that would
happen), just return zeroes, and notify about the problem
ASYNCHRONOUSLY.
And when notifying, give as much useful information as possible: the
virtual and physical address of the data, please, in addition to maybe
lower level bank information. Add a bit for "multiple errors", so that
whoever then later tries to maybe recover, can tell if it has complete
data or not.
The OS can send a SIGBUS with that information to user land that can
then maybe recover. Or it can say "hey, I'm in some recovery mode,
I'll try to limp along with incomplete data". Sometimes "recover"
means "try to limp along, notify the user that their hw is going bad,
but try to use the data anyway".
Again, what Intel engineers actually did with the synchronous
non-recoverable thing was not "the only thing I could possibly have
done".
It was literally the ABSOLUTE WORST POSSIBLE THING they could do, with
potentially a dead machine as a result.
And now - after years of pain - you have the gall to repeat that
idiocy that you got from people who have shown themselves to be
completely wrong in the past?
Why?
Why do you take their known wrong approach as unthinking gospel? Just
because it's been handed down over generations on stone slabs?
I really really detest the whole mcsafe garbage. And I absolutely
*ABHOR* how nobody inside of Intel has apparently ever questioned the
brokenness at a really fundamental level.
That "I throw my hands in the air and just give up" thing is a
disease. It's absolutely not "what else could we do".
Linus