Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

From: Naoya Horiguchi
Date: Thu Apr 09 2015 - 04:49:15 EST


On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote:
>
> * Borislav Petkov <bp@xxxxxxxxx> wrote:
>
> > Btw, Ingo had some reservations about this. Ingo?
>
> Yeah, so my concerns are the following:
>
> > kexec disables (or "shoots down") all CPUs other than the crashing
> > CPU before entering the 2nd kernel. However, MCA is still enabled so
> > if an MCE happens and broadcasts to the CPUs after the main thread
> > starts the 2nd kernel (which might not initialize its MCE handler
> > yet, or might decide not to enable it) the MCE handler runs only on
> > the other CPUs (not on the main thread) leading to kernel panic
> > during MCE synchronization. The user-visible effect of this bug is a
> > kdump failure.
>
> So the thing is, when we boot up the second kernel there will be a
> window where the old handler isn't valid (because the new kernel has
> its own pagetables, etc.) and the new handler is not installed yet.
>
> If an MCE hits that window, it's bad luck. (unless the bootup sequence
> is rearchitected significantly to allow cross-kernel inheritance of
> MCE handlers.)
>
> So I think we can ignore _that_ race.
>
> The more significant question is: what happens when an MCE arrives
> whiel the kdump is proceeding - as kdumps can take a long time to
> finish when there's a lot of RAM.

Without this patch, MCE makes idling CPUs unpreferably wake up and
needlessly run MCE handler, which disturbs memory so does harm on the kdump.
This patch improves not only the transition phase, but also that window.

> But ... since the 'shootdown' is analogous to a CPU hotplug CPU-down
> sequence, I suppose that the existing MCE code should already properly
> handle the case where an MCE arrives on a (supposedly) dead CPU,
> right?

Currently not, so Tony mentioned some idea about it (although not included
in this patch.)

> In that case installing a separate MCE handler looks like the
> wrong thing.

One difference bewteen kdump and CPU offline is whether we need handle
MCEs then or not. In CPU offline situation, running CPUs have to continue
their normal operations, so it's imporatant to handle MCE (i.e. log and/or
take recovery action), so I think that should be done in our main MCE
handler, do_machine_check().
But that's not the case in kdump situation (logging or recovering is
not possible/necessary any more.) So it seems make sense to me to
separate the handler.

> > Our standard MCE handler do_machine_check() assumes a bunch of
> > things about system's status and it's hard to alter it to cover
> > kexec/kdump context, so add another, kdump-specific one and switch
> > to it.
>
> So I don't like this principle either: 'our current code is a mess
> that might not work, add new one'.
>
> > Note that this problem exists since current MCE handler was
> > implemented in 2.6.32, and recently commit 716079f66eac ("mce: Panic
> > when a core has reached a timeout") made it more visible by changing
> > the default behavior of the synchronization timeout from "ignore" to
> > "panic".
>
> Looks like that's the real problem. How about the kdump crash dumper
> sets it back to 'ignore' again when we crash, and also double check
> how we handle various corner cases?

Boris mentions this in another email, so I'll reply to it.

Thanks,
Naoya Horiguchi--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/