Re: [PATCH v7] x86: mce: kexec: switch MCE handler for kexec/kdump

From: Naoya Horiguchi
Date: Tue Apr 07 2015 - 04:03:37 EST


On Mon, Apr 06, 2015 at 01:56:40PM +0200, Borislav Petkov wrote:
> On Fri, Mar 06, 2015 at 10:22:16AM +0000, Naoya Horiguchi wrote:
> > kexec disables (or "shoots down") all CPUs other than a crashing CPU before
> > entering the 2nd kernel. But the MCE handler is still enabled after that,
> > so if MCE happens and broadcasts over the CPUs after the main thread starts
> > the 2nd kernel (which might not initialize MCE device yet, or might decide
> > not to enable it,) MCE handler runs only on the other CPUs (not on the main
>
> You lost me here: "MCE device"?

I meant "MCE handler", sorry.

> > thread,) leading to kernel panic with MCE synchronization. The user-visible
> > effect of this bug is kdump failure.
> >
> > Our standard MCE handler do_machine_check() assumes some about system's
> > status and it's hard to alter it to cover kexec/kdump context, so let's add
> > another kdump-specific one and switch to it.
> >
> > Note that this problem exists since current MCE handler was implemented in
>
> Did we have kdump then at all and was that a "problem" then even?

At that time we did have kdump because MCE synchronization timeout didn't
cause panic. But there're some minor problems like
1) idling CPUs can respond to MCE and try to do some work (like reading MCE
banks and doing mce_log()), which means disturbing memory,
2) no direct hint about unreliability of the kdump.

...

> > +
> > + rdmsrl(MSR_IA32_MCG_STATUS, m.mcgstatus);
> > + if (regs && m.mcgstatus & (MCG_STATUS_RIPV|MCG_STATUS_EIPV))
> > + m.cs = regs->cs;
>
> Newline here please.

OK.

> > + for (i = 0; i < mca_cfg.banks; i++) {
> > + rdmsrl(MSR_IA32_MCx_STATUS(i), m.status);
> > + severity = mce_severity(&m, mca_cfg.tolerant, &nmsg, true);
>
> MCE severity got reworked recently, please redo your patch against
> latest tip/master:

Rebased to tip/master, and fixed the conflict.

> In file included from arch/x86/kernel/cpu/mcheck/mce-severity.c:18:0:
> arch/x86/kernel/cpu/mcheck/mce-internal.h:15:14: error: âmce_severityâ redeclared as different kind of symbol
> extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool is_excp);
> ^
> In file included from arch/x86/kernel/cpu/mcheck/mce-severity.c:16:0:
> ./arch/x86/include/asm/mce.h:270:5: note: previous declaration of âmce_severityâ was here
> int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp);
> ^
> arch/x86/kernel/cpu/mcheck/mce-severity.c:274:7: error: âmce_severityâ redeclared as different kind of symbol
> int (*mce_severity)(struct mce *m, int tolerant, char **msg, bool is_excp) =
> ^
> In file included from arch/x86/kernel/cpu/mcheck/mce-severity.c:16:0:
> ./arch/x86/include/asm/mce.h:270:5: note: previous declaration of âmce_severityâ was here
> int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp);
> ^
> make[4]: *** [arch/x86/kernel/cpu/mcheck/mce-severity.o] Error 1
> make[4]: *** Waiting for unfinished jobs....
> In file included from arch/x86/kernel/cpu/mcheck/mce.c:50:0:
> arch/x86/kernel/cpu/mcheck/mce-internal.h:15:14: error: âmce_severityâ redeclared as different kind of symbol
> extern int (*mce_severity)(struct mce *a, int tolerant, char **msg, bool is_excp);
> ^
> In file included from arch/x86/kernel/cpu/mcheck/mce.c:47:0:
> ./arch/x86/include/asm/mce.h:270:5: note: previous declaration of âmce_severityâ was here
> int mce_severity(struct mce *a, int tolerant, char **msg, bool is_excp);
> ^
> make[4]: *** [arch/x86/kernel/cpu/mcheck/mce.o] Error 1
> make[3]: *** [arch/x86/kernel/cpu/mcheck] Error 2
> make[3]: *** Waiting for unfinished jobs....
> make[2]: *** [arch/x86/kernel/cpu] Error 2
> make[1]: *** [arch/x86/kernel] Error 2
> make: *** [arch/x86] Error 2
> make: *** Waiting for unfinished jobs....
>
> > + if (severity > worst) {
> > + worst = severity;
> > + msg = nmsg;
> > + }
> > + }
> > +
> > + if (worst >= MCE_UC_SEVERITY)
> > + panic("unignorable machine check under kdumping: %s", msg);
>
> Just say:
>
> panic("kdump: Fatal machine check: %s", msg);

OK.

> > +
> > + pr_emerg("MCE triggered under kdumping. If you are lucky enough, you will have a kdump. Otherwise, this is a dying message.\n");
>
> This message is also funny.
>
> How about:
>
> pr_emerg("kdump: Non-fatal MCE detected - kernel dump might be unreliable.\n");

OK, and I noticed that printing out msg in this case is helpful.

> > +
> > +clear_mcg_status:
> > + wrmsrl(MSR_IA32_MCG_STATUS, 0);
> > +out:
> > + sync_core();
> > +}
> > +
> > +/*
> > + * Switch the MCE handler to kdump-specific one
> > + *
> > + * Standard MCE handler do_machine_check() is not designed for kexec/kdump
> > + * context, where we can't expect MCE recovering and logging to work fine
> > + * because the kernel might be unstable (all CPUs except one must be quiet
> > + * as possible.)
>
> "... all CPUs except one must be idle."
>
> > + *
> > + * Moreover, in such situation getting kdump is prior to doing something for
>
> ... " in such situations, getting a kernel dump is more important than handling
> MCEs..."
>
> > + * MCEs because what the users are really interested in is to find what caused
>
> find out
>
> > + * the crashing, not what caused the crashing to fail.
>
> ... the crash." Fullstop.

All these comments are included, thanks.

> > + *
> > + * So let's switch MCE handler to the one suitable for kexec/kdump situation.
> > + */
> > +void switch_mce_handler_for_kdump(void)
>
> s/switch_mce_handler_for_kdump/kdump_setup_mce/

Yes, it's simpler.

Thanks,
Naoya Horiguchi

> > +{
> > + kdump_cpu = smp_processor_id();
> > + machine_check_vector = machine_check_under_kdump;
> > +}
> > +#else
> > +static inline void switch_mce_handler_for_kdump(void) {}
> > +#endif
> > +
> > /*
> > * This is used to VMCLEAR all VMCSs loaded on the
> > * processor. And when loading kvm_intel module, the
> > @@ -166,6 +251,8 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > /* The kernel is broken so disable interrupts */
> > local_irq_disable();
> >
> > + switch_mce_handler_for_kdump();
> > +
> > kdump_nmi_shootdown_cpus();
> >
> > /*
> > --
> > 1.9.3
> >
>
> --
> Regards/Gruss,
> Boris.
>
> ECO tip #101: Trim your mails when you reply.
> --
> N‹§²æ¸›yú²X¬¶ÇvØ–)Þ{.nlj·¥Š{±‘êX§¶›¡Ü}©ž²ÆzÚj:+v‰¨¾«‘êZ+€Êzf£¢·hšˆ§~†­†Ûÿû®w¥¢¸?™¨è&¢)ßf”ùy§m…á«a¶Úÿ 0¶ìå