Re: troubleshooting/debugging hard locks

From: Lee Howard
Date: Wed May 14 2008 - 23:43:54 EST


Zan Lynx wrote:
On Wed, 2008-05-14 at 15:43 -0700, Ray Lee wrote:
On Wed, May 14, 2008 at 12:27 PM, Lee Howard <faxguy@xxxxxxxxxxxxxxxx> wrote:

But, without kernel messages indicating where to look to debug... what is
the best approach to start troubleshooting and debugging this condition? Is
there some general debug feature that can be enabled in the kernel that
would help hone in on the culprit?
There's something called the NMI watchdog, that will print debugging
messages out if it finds the system has hard locked. The short version
is that you should add "nmi_watchdog=1" (no quotes) to the line in
GRUB that has the kernel options. That assumes you have an APIC on the
system. If that's not the case (you're on Uniprocessor, and no APIC)
then you can try nmi_watchdog=2 instead. That'll only work on some
systems, though.

Better docs (than my cheesy writeup) are in
Documentation/nmi_watchdog.txt in the kernel source distribution.

I was once told to add these to the kernel command line as well when
using NMI watchdog and they do seem to help it trigger more reliably:

"idle=poll nohz=off"

Thank you to both Ray and Zan. This was very helpful, and I think that it has gotten me what I needed.

"serial8250: too much work for irq16"

Interestingly, now CTRL-SysRq-H will wake it back up... things get running normally afterwards - the hard lock never occurs.

Thanks,

Lee.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/