Re: [PATCH v8] x86: mce: kexec: switch MCE handler for kexec/kdump

From: Naoya Horiguchi
Date: Thu Apr 09 2015 - 05:00:37 EST


On Thu, Apr 09, 2015 at 10:21:25AM +0200, Borislav Petkov wrote:
> On Thu, Apr 09, 2015 at 10:00:30AM +0200, Ingo Molnar wrote:
> > So the thing is, when we boot up the second kernel there will be a
> > window where the old handler isn't valid (because the new kernel has
> > its own pagetables, etc.) and the new handler is not installed yet.
> >
> > If an MCE hits that window, it's bad luck. (unless the bootup sequence
> > is rearchitected significantly to allow cross-kernel inheritance of
> > MCE handlers.)
> >
> > So I think we can ignore _that_ race.
>
> Yah, that's the "tough luck" race.
>
> > The more significant question is: what happens when an MCE arrives
> > whiel the kdump is proceeding - as kdumps can take a long time to
> > finish when there's a lot of RAM.
>
> We say that the dump might be unreliable.
>
> > But ... since the 'shootdown' is analogous to a CPU hotplug CPU-down
> > sequence, I suppose that the existing MCE code should already properly
> > handle the case where an MCE arrives on a (supposedly) dead CPU,
> > right? In that case installing a separate MCE handler looks like the
> > wrong thing.
>
> Hmm, so mce_start() does look only on the online CPUs. So if crash does
> maintain those masks correctly...
>
> > So I don't like this principle either: 'our current code is a mess
> > that might not work, add new one'.
>
> Well, we can try to simplify it in the sense that those assumptions like
> mcelog and other MCE consuming crap and notifier chain are tested for
> their presence before using them...
>
> I'd be open for this if we have a way to test this kdump scenario. For
> now, not even qemu can do that.

I replied about testing.
That might be tricky a little, but I hope it helps.

> > Looks like that's the real problem. How about the kdump crash dumper
> > sets it back to 'ignore' again when we crash, and also double check
> > how we handle various corner cases?
>
> I think I even suggested that at some point. Or was it to increase the
> tolerance level. So Naoya, what's wrong with this again? I forgot.

Even if we raise tolerant level in running kdump, that doesn't prevent
idling CPUs from running MCE handlers when MCE arrives, which makes memory
accesses (losing information from kdump's viewpoint) and spits
"MCE synchronization timeout" messages (unclear and confusing for users.)
And it also leaves a potential risk of being broken again when do_machine_check()
changes in the future (which maybe come from sharing code to handle different
situations.)

So raising tolerance is OK as a "minimum change" approach, but it has
above downsides to be traded off.

Thanks,
Naoya Horiguchi

> Because this would be the simplest. Simply set tolerance level to 3 and
> dump away...--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/