x86_64 NMI watchdog breakage in 2.6.12-rc2-mm3

From: Mikael Pettersson
Date: Tue Apr 19 2005 - 05:56:25 EST


Andi & Andrew,

The "x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state.patch" in
2.6.12-rc2-mm3 appears to have broken the NMI watchdog. Specifically:

diff -puN arch/x86_64/kernel/nmi.c~x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state arch/x86_64/kernel/nmi.c
--- 25/arch/x86_64/kernel/nmi.c~x86_64-switch-smp-bootup-over-to-new-cpu-hotplug-state Thu Apr 7 15:15:01 2005
+++ 25-akpm/arch/x86_64/kernel/nmi.c Thu Apr 7 15:15:01 2005
@@ -133,12 +133,6 @@ static int __init check_nmi_watchdog (vo
mdelay((10*1000)/nmi_hz); // wait 10 ticks

for (cpu = 0; cpu < NR_CPUS; cpu++) {
-#ifdef CONFIG_SMP
- /* Check cpu_callin_map here because that is set
- after the timer is started. */
- if (!cpu_isset(cpu, cpu_callin_map))
- continue;
-#endif
if (cpu_pda[cpu].__nmi_count - counts[cpu] <= 5) {
printk("CPU#%d: NMI appears to be stuck (%d)!\n",
cpu,

This is wrong because in general the number of actual CPUs is _less_
than the number of configured CPUs (== NR_CPUS). Hence the code will
now check the NMI counts of non-existent CPUs, complain that they are
stuck, and disable the NMI watchdog. Actually the disablement is broken
in this case, but that's a different issue.

The error is easily reproducible by booting an SMP kernel on a UP box.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/