[PATCH v4] x86: mce: kexec: switch MCE handler for kexec/kdump

From: Naoya Horiguchi
Date: Wed Mar 04 2015 - 02:52:47 EST


On Tue, Mar 03, 2015 at 06:09:27PM +0000, Luck, Tony wrote:
> +static void machine_check_under_kdump(struct pt_regs *regs, long error_code)
> +{
> + if (mca_cfg.kdump_cpu == smp_processor_id())
> + pr_emerg("MCE triggered when kdumping. If you are lucky enough, you will have a kdump. Otherwise, this is a dying message.\n");
>
> I'm worried about the SRAR case here. Your code just returns, which will trigger the same machine check again. The system will spin forever printing this message.

You're right.

> I think you have to look at MCG_STATUS and scan the machine check banks to make a choice. There are some simple cases:
>
> MCG_STATUS.RIPV=0 -> cannot return (where will the cpu go - you have no idea!)
> SRAO -> safe to just return
> SRAR -> should not return
>
> But the rest may require some thought. If there is a PCC=1 error, then you may end up with a corrupt dump. Perhaps this case will already be covered by RPIV==0?

A PCC=1 error is defined as UC error in SDM, and our severity assessor returns
MCE_PANIC_SEVERITY for it, so kdump should abort in such case.

So I added severity checking code in the new MCE handler in the following patch,
which borrowed some more code from do_machine_check(). Please see also the
changelog in the patch description.

Thanks,
Naoya Horiguchi
---