[PATCH v2 0/3] More machine check recovery fixes

From: Tony Luck
Date: Tue Aug 17 2021 - 20:30:03 EST


Fix a couple of issues in machine check handling

1) A repeated machine check inside the kernel without calling the task
work function between machine checks it will go into an infinite
loop
2) Machine checks in kernel functions copying data from user addresses
send SIGBUS to the user as if the application had consumed the
poison. But this is wrong. The user should see either an -EFAULT
error return or a reduced byte count (in the case of write(2)).

My latest tests have been on v4.14-rc6 with this patch (that's already
in -mm) applied:
https://lore.kernel.org/r/20210817053703.2267588-1-naoya.horiguchi@xxxxxxxxx

Changes since v1:
1) Fix bug in kill_me_never() that forgot to clear p->mce_count so
repeated recovery in the same task would trigger the panic for
"Machine checks to different user pages"
[Note to Jue Wang ... this *might* be why your test that injects
two errors into the same buffer passed to a write(2) syscall
failed with this message]
2) Re-order patches so that "Avoid infinite loop" can be backported
to stable.

Note that the other two parts of this series depend upon Al Viro's
extensive re-work to lib/iov_iter.c ... so don't try to backport those
without also picking up Al's work.

Tony Luck (3):
x86/mce: Avoid infinite loop for copy from user recovery
x86/mce: Change to not send SIGBUS error during copy from user
x86/mce: Drop copyin special case for #MC

arch/x86/kernel/cpu/mce/core.c | 62 ++++++++++++++++++++++++----------
arch/x86/lib/copy_user_64.S | 13 -------
include/linux/sched.h | 1 +
3 files changed, 45 insertions(+), 31 deletions(-)


base-commit: 7c60610d476766e128cc4284bb6349732cbd6606
--
2.29.2