Re: Catching NForce2 lockup with NMI watchdog

From: Mikael Pettersson
Date: Fri Dec 05 2003 - 07:07:23 EST


Mike Fedyk writes:
> On Fri, Dec 05, 2003 at 08:40:58AM +0100, Mikael Pettersson wrote:
> > Jesse Allen writes:
> > > Hi,
> > >
> > > I have a NForce2 board and can easily reproduce a lockup with grep on an IDE
> > > hard disk at UDMA 100. The lockup occurs when both Local APIC + IO-APIC are
> > > enabled. It was suggested to me to use NMI watchdog to catch it. However, the
> > > NMI watchdog doesn't seem to work.
> > >
> > > When I set the kernel parameter "nmi_watchdog=1" I get this message in
> > > /var/log/syslog:
> > > Dec 4 20:10:30 tesore kernel: ..MP-BIOS bug: 8254 timer not connected to
> > > IO-APIC
> > > Dec 4 20:10:30 tesore kernel: timer doesn't work through the IO-APIC -
> > > disabling NMI Watchdog!
> > >
> > > "nmi_watchdog=2" seems to work at first, In /var/log/messages:
> > > Dec 4 20:13:11 tesore kernel: testing NMI watchdog ... OK.
> > > but it still locks up.
> >
> > The NMI watchdog can only handle software lockups, since it relies on
> > the CPU, and for nmi_watchdog=1 the I/O-APIC + bus, still running.
> > Hardware lockups result in, well, hardware lockups :-(
>
> But nmi_watchdog=1 is supposed to work with APIC, or IO-APIC, and it isn't
> for his motherboard. It doesn't increment NMI in /proc/interrupts. And it
> gives the above error message. Isn't that a bug?

nmi_watchdog=1 only falls back to nmi_watchdog=2 if no SMP is detected.
If the I/O-APIC is detected but doesn't work, then the fallback
does not happen, and you need to set nmi_watchdog=2 explicitly.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/