Re: [PATCH 22/25] x86/mcheck: Do the init in one place

From: Luck, Tony
Date: Mon Nov 07 2016 - 13:55:41 EST

On Mon, Nov 07, 2016 at 07:45:32PM +0100, Borislav Petkov wrote:
> On Thu, Nov 03, 2016 at 03:50:18PM +0100, Sebastian Andrzej Siewior wrote:
> > Part of the init (memory allocation and so on) is done
> > in mcheck_cpu_init(). While moving the the allocation to
> > mcheck_init_device() (where the hotplug calls are initialized) it
> > becomes necessary to move the callback (mcheck_cpu_init()), too.
> >
> > The callback is now removed from identify_cpu() and registered as a
> > hotplug event which is invoked as the very first one which is shortly
> > after the original point of invocation (look at smp_store_cpu_info() and
> > notify_cpu_starting() in smp_callin()).
> > One "visible" difference is that MCE for the boot CPU is not enabled at
> > identify_boot_cpu() time but at device_initcall_sync() time. Either way,
> > both times we had no userland around.
> Uh, hm, I'm not sure about this: so the issue I see with this is that
> the more we're delaying the enabling or MCE reporting - and especially
> setting CR4[MCE] - the more we're increasing the window where a MCE
> during early boot will cause a shutdown. (This is what happens if
> CR4[MCE]=0b).
> Perhaps we should split the init into a very early init which doesn't
> need to be part of hotplug and the rest, which can do mce_disable_cpu()
> and mce_reenable_cpu().
> Tony, how do you see this?

I don't think that helps as much as you'd like it to help (at
least on Intel). A broadcast machine check that finds the boot
CPU has set CR4[MCE]=1 is still going to end up in reset if any
other CPU still has CR4[MCE]=0