Re: [PATCH] ftrace based hard lockup detector

From: Ingo Molnar
Date: Mon Jan 19 2009 - 08:06:58 EST



* Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Sun, 18 Jan 2009, Frederic Weisbecker wrote:
>
> > Like the NMI watchdog, this feature try to detect hard lockups by
> > lurking at the non-progress of the timer interrupts.
> >
> > You can enable it at boot time by passing the ftrace_hardlockup parameter.
> > I plan to add a debugfs file to enable/disable at runtime.
> >
> > When a hardlockup is detected, it will print a backtrace. Perhaps it
> > would be good to print the locks held from lockdep too?
> >
> > It only support x86 for the moment, because a kind of generic timer interrupt
> > counter is needed on all archs to have it generic.
> >
> > Signed-off-by: Frederic Weisbecker <fweisbec@xxxxxxxxx>
>
> Hi Frederic,
>
> This seems like a rewrite of the NMI lockup code. In my debugging, I
> simply put ftrace_dump in the NMI lockup, which gives me a ftrace dump
> as soon as NMI detects a lockup. I'm a bit confused at what this gives
> us over that?

this is different from the NMI watchdog in a number of ways:

- it works on all platforms and in all situations where the NMI watchdog
does not work.

- in theory it can detect hard lockups in situations where the NMI
watchdog is disabled, such as suspend/resume or early bootup.
(especially early bootup lockups are nasty and the NMI watchdog is
enabled relatively late)

- it could be extended to detect 'soft' lockups too - i.e. we could have
a one-stop facility to detect all kinds of "kernel does not seem to
progress" lockups.

But it's not as complete as the NMI watchdog: it relies on instrumented
function calls rolling on and on during the lockup - that's not the case
when we get a hard lockup due to a tight, infinite loop somewhere.

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/