RE: [PATCH] x86/mce: set MCE_IN_KERNEL_COPYIN for all MC-Safe Copy
From: Luck, Tony
Date: Mon May 22 2023 - 14:02:40 EST
>> Is this patch in addition to, or instead of, the earlier core dump patch?
>
> This is an addition, in previous coredump patch, manually call
> memory_failure_queue()
> to be asked to cope with corrupted page, and it is similar to your
> "Copy-on-write poison recovery"[1], but after some discussion, I think
> we could add MCE_IN_KERNEL_COPYIN to all MC-safe copy, which will
> cope with corrupted page in the core do_machine_check() instead of
> do it one-by-one.
Thanks for the context. I see how this all fits together now).
Your patch looks good.
Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
-Tony
One small observation from testing. I injected to an application which consumed
the poisoned data and was sent a SIGBUS.
Kernel did not crash (hurrah!)
Console log said:
[ 417.610930] mce: [Hardware Error]: Machine check events logged
[ 417.618372] Memory failure: 0x89167f: recovery action for dirty LRU page: Recovered
... EDAC messages
[ 423.666918] MCE: Killing testprog:4770 due to hardware memory corruption fault at 7f8eccf35000
A core file was generated and saved in /var/lib/systemd/coredump
But my shell (/bin/bash) only said:
Bus error
not
Bus error (core dumped)
-Tony