Re: [PATCH 2/2] x86: re-enable MCE on secondary CPUS aftersuspend/resume

From: Andreas Herrmann
Date: Mon Dec 15 2008 - 14:05:34 EST


On Fri, Dec 12, 2008 at 08:06:21PM +0100, Andi Kleen wrote:
> Andreas Herrmann <andreas.herrmann3@xxxxxxx> writes:
>
> > Impact: fix suspend/resume bug with MCE
> >
> > After suspend/resume MCx_CTL registers of secondary CPUs are cleared.
> > (At least that's what I've observed on several systems.)
> > Linux currently only re-initializes MCE on the boot CPU - see mce_resume().
> > Thus after suspend/resume we end up with a system where MCE is active
> > on the boot CPU but switched off on all other CPUs.
> >
> > By calling mce_init() whenever a CPU comes online this problem is
> > solved.
>
> Can you double check that please?
>
> Suspend/resume are supposted to hotunplug all CPUs except the BP and
> then re-online them on resume (with "disable_nonboot_cpus()) . The
> re-online initializes MCEs in the standard CPU bootup path.

For BP we have

/* On resume clear all MCE state. Don't want to see leftovers from the BIOS.
Only one CPU is active at this time, the others get readded later using
CPU hotplug. */
static int mce_resume(struct sys_device *dev)
{
mce_init(NULL);
return 0;
}

For APs mcheck_init() is called on resume. But as the respective bit
for an AP is usually set in "mce_cpus" after boot (which is correct, I
think) mcheck_init does not call mce_init, see:

void __cpuinit mcheck_init(struct cpuinfo_x86 *c)
{
static cpumask_t mce_cpus = CPU_MASK_NONE;

mce_cpu_quirks(c);

if (mce_dont_init ||
cpu_test_and_set(smp_processor_id(), mce_cpus) ||
!mce_available(c))
=> return;

mce_init(NULL);
mce_cpu_features(c);
}

But we need to call mce_init to clear all MCE state.
IMHO the best location to call mce_init for APs is the cpu notifier.


Regards,

Andreas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/