RE: [PATCH v5] x86/mce: Avoid infinite loop for copy from user recovery
From: Luck, Tony
Date: Tue Feb 02 2021 - 11:11:57 EST
> And the much more important question is, what is the code supposed to
> do when that overflow *actually* happens in real life? Because IINM,
> an overflow condition on the same page would mean killing the task to
> contain the error and not killing the machine...
Correct. The cases I've actually hit, the second machine check is on the
same address as the first. But from a recovery perspective Linux is going
to take away the whole page anyway ... so not complaining if the second
(or subsequent) access is within the same page makes sense (and that's
what the patch does).
The code can't handle it if a subsequent #MC is to a different page (because
we only have a single spot in the task structure to store the physical page
address). But that looks adequate. If the code is wildly accessing different
pages *and* getting machine checks from those different pages ... then
something is very seriously wrong with the system.
-Tony