Re: 2.6.14-rc5-rt6 -- False NMI lockup detects
From: Steven Rostedt
Date: Wed Oct 26 2005 - 06:34:45 EST
On Tue, 2005-10-25 at 13:00 -0700, George Anzinger wrote:
> Steven Rostedt wrote:
> > On Tue, 2005-10-25 at 16:28 +0200, Ingo Molnar wrote:
> >
> >>* Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:
> >>
> >>
> >>>Hi Ingo and Thomas,
> >>>
> >>>On some of my machines, I've been experiencing false NMI lockups.
> >>>This usually happens on slower machines, and taking a look into this,
> >>>it seems to be due to a short time where no processes are using
> >>>timers, and the ktimer interrupts aren't needed. So the APIC timer,
> >>>which now is used only for the ktimers, has a five second pause, and
> >>>causes the NMI to go off. The NMI uses the apic timer to determine
> >>>lockups.
> >>
> >>this would be a bug - the jiffy tick should be processed every 1 msec,
> >>regardless of whether there are any ktimers pending. (in the future we
> >>want to use a special ktimer for the jiffy tick, but that's not
> >>implemented yet.)
> >>
> >
> >
> > Isn't the jiffy tick implemented with the PIT when possible? So the apic
> > is only used when a timer is needed. Also note that this "lockup"
> > happens on boot up while things are being initialized, so not many
> > things may be using the timer.
>
> Somewhere in the not too distant past the NMI watchdog was moved from the PIT tick to the APIC
> timer. Might want to move it back, at least for now...
Ingo,
So what's the fix here? The PIT is used for jiffies and the NMI lockup
uses the apic timer to test against. So if there isn't a ktimer that
goes off for 5 seconds (usually on slower machines), you get a false
positive for a lockup. Is this really a bug that the apic doesn't go
off for 5 seconds?
So, do we move the NMI detect back to the PIT, or use a more generic
solution that I supplied? I actually prefer my generic solution because
1. we don't need to worry about which timers we are using at the moment.
2. Like you said, it can be used later on with things like dynamic
ticks, or tickless solutions.
-- Steve
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/