Re: [PATCH] x86/mce: Keep quiet in case of broadcasted mce after system panic

From: Xunlei Pang
Date: Thu Feb 16 2017 - 00:34:34 EST


On 01/26/2017 at 02:44 PM, Borislav Petkov wrote:
> On Thu, Jan 26, 2017 at 02:30:02PM +0800, Xunlei Pang wrote:
>> The hardware machine check is hard to reproduce, but the mce code of
>> RHEL7 is quite the same as that of tip/master, anyway we are able to
>> inject software mce to reproduce it.
> Please give me your exact steps so that I can try to reproduce it here
> too.
>

Hi Borislav,

I tried to use qemu to inject SRAO("mce -b 0 0 0xb100000000000000 0x5 0x0 0x0"),
it works well in 1st kernel, but it doesn't work for 1st kernel after kdump boots(seems
the cpus remain in 1st kernel don't respond to the simulated broadcasting mce).

But in theory, we know cpus belong to kdump kernel can't respond to the
old mce handler, so a single SRAO injection in 1st kernel should be similar.
For example, I used "... -smp 2 -cpu Haswell" to launch a simulation with broadcast
mce supported, and inject SRAO to cpu0 only through qemu monitor
"mce 0 0 0xb100000000000000 0x5 0x0 0x0", cpu0 will timeout/panic and reboot
the machine as follows(running on linux-4.9):
Kernel panic - not syncing: Timeout: Not all CPUs entered broadcast exception handler
Kernel Offset: disabled
Rebooting in 30 seconds..

Regards,
Xunlei