Re: [watchdog] combine nmi_watchdog and softlockup

From: Frederic Weisbecker
Date: Thu Apr 08 2010 - 20:23:07 EST


On Tue, Apr 06, 2010 at 02:59:38PM -0400, Don Zickus wrote:
> > I fear the cpu clock is not going to help you detecting any hard lockups.
> > If you're stuck in an interrupt or an irq disabled loop, your cpu clock is
> > not going to fire.
>
> Yup. I put that in the changelog of some nmi code I posted a few weeks
> ago. I believe Ingo was looking to have a fallback plan in case hw perf
> events wasn't available.


I'm not sure a fallback plan is welcome. What we want is a single
implementation.
If archs already have their own hardlockup detectors, then good
for them as they have this fallback implementation already.

For those who haven't it or who want to profit from a generic
code that does a good deal of the error prone work for them already,
they can just implement a basic perf support for the cpu-cycle
events.

Perf events gives a chance to provide a generic support for hardlockup
detection and supporting a fallback solution from generic code would
be a waste of time.


> With this code, moving to sw perf events, kinda kills the ability to
> detect hard lockups, but you can still detect softlockups.


Totally. Actually the hardlockup part should be even ifdef'ed.
What is not nice is that we have a dependency to CONFIG_PERF_EVENT
for the softlockup detector now while that could be avoided easily.

You have the following layer:

- A task that updates softlockup last touch
- An hrtimer that wakes up this task and logs the irq progression
- A (in the best case) NMI that checks the irq progression and the softlockup
last touch progression.

Why not moving the softlockup last touch progression check from the hrtimer too?
This way you avoid the perf events dependency for the softlockup detector.
The new layer can then be:

- The task
- hrtimer: check softlockup last touch progression (is_softlockup()) and
then wake up the task again. Log irq progression.
- NMI: check irq progression.

This way can just ifdef all the perf events related things, only useful
for the hardlockup detector. And the softlockup detector can still be used
by archs that don't have perf events support.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/