Re: [patch 0/5] x86: mce: Bugfixes, cleanups and a new CMCI pollversion

From: Thomas Gleixner
Date: Fri Jun 08 2012 - 03:49:38 EST


On Thu, 7 Jun 2012, Chen Gong wrote:
>
> But during the CPU online/offline test I found an issue. After *STORM*
> mode is entered, it can't come back from *STORM* mode to normal
> interrupt mode. At least there exists such an issue: when *STORM* is
> entered, in the meanwhile, one CPU is offline during this period,
> which means *cmci_storm_on_cpus* can't decrease to 0 because there
> is one bit stuck on this offlined CPU. So we should detect such
> situation and decrease on *cmci_storm_on_cpus* at proper time.

Yes, we need to reset the storm state as well I think.

> BTW, even I online the *CPU* in above situation, the normal CMCI
> still doesn't come back, strange.

That's weird.

> I still have another question: When we handle following case:
> mce_cpu_callback(struct notifier_block *
> mce_device_remove(cpu);
> break;
> case CPU_DOWN_PREPARE:
> - del_timer_sync(t);
> smp_call_function_single(cpu, mce_disable_cpu, &action, 1);
> + del_timer_sync(t);
> break;
>
> Where we add this timer back? I can't find it in "case CPU_ONLINE".

The timer gets added back via mcheck_cpu_init(), which is called on
the newly onlined cpu from smp_callin().

Thanks,

tglx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/