On Thu, Nov 03, 2016 at 03:50:18PM +0100, Sebastian Andrzej Siewior wrote:
> Part of the init (memory allocation and so on) is done
> in mcheck_cpu_init(). While moving the the allocation to
> mcheck_init_device() (where the hotplug calls are initialized) it
> becomes necessary to move the callback (mcheck_cpu_init()), too.
> The callback is now removed from identify_cpu() and registered as a
> hotplug event which is invoked as the very first one which is shortly
> after the original point of invocation (look at smp_store_cpu_info() and
> notify_cpu_starting() in smp_callin()).
> One "visible" difference is that MCE for the boot CPU is not enabled at
> identify_boot_cpu() time but at device_initcall_sync() time. Either way,
> both times we had no userland around.

Uh, hm, I'm not sure about this: so the issue I see with this is that
the more we're delaying the enabling or MCE reporting - and especially
setting CR4[MCE] - the more we're increasing the window where a MCE
during early boot will cause a shutdown. (This is what happens if

Perhaps we should split the init into a very early init which doesn't
need to be part of hotplug and the rest, which can do mce_disable_cpu()
and mce_reenable_cpu().

Tony, how do you see this?

> @@ -2584,11 +2580,26 @@ static __init int mcheck_init_device(void)
> goto err_out;
> }
> + err = __mcheck_cpu_mce_banks_init();

I guess you can merge this one...

> + if (err)
> + goto err_out_mem;
> +
> mce_init_banks();

into this one now.

But let's sort out the bigger issue first.


