Re: 2.6.30-rc1 - nmi_watchdog broken?

From: Ingo Molnar
Date: Tue Apr 14 2009 - 09:23:45 EST



* Ed Tomlinson <edt@xxxxxx> wrote:

> On Tuesday 14 April 2009 04:42:32 Ingo Molnar wrote:
> >
> > * Ed Tomlinson <edt@xxxxxx> wrote:
> >
> > > Hi,
> > >
> > > I've been having fun finding bugs in 30-rc1. One of them is a
> > > hard freeze. I've not seen this type of problem on this hardware
> > > before 30-rc1 - so I doubt if its hardware. The best way I know
> > > to debug a hard hang is with the nmi_watchdog. I just cannot get
> > > it to work.
> >
> > [ Btw., have you tried CONFIG_PROVE_LOCKING=y - does it produce
> > anything before or at the hard lockup point? ]
> >
> > > The system is a 3 core amd cpu on a 790gx chipset.
> > >
> > > If I boot the nmi_watchdog=1 it complains that lapci is not
> > > available and the boot stops. Same problem if I change the
> > > clocksource to tsc, If I disable highres timers it panics. If I
> > > use nmi_watchdog=2 it panics. Am I doing something wrong or have I
> > > hit a bug?
> > >
> > > Logs of boots with and without highres timers inlined below.
> >
> > hm, nmi_watchdog=1 acting funny is not unheard of. But
> > nmi_watchdog=2 should really work. How does it panic, do
> > you have a capture of that?
>
> I had not tried nmi_watchdog=2 highres=off. This works. Looks
> like there is a conflict between highres timers and nmi_watchdog
> here.

yes. Both use a limited resource of the lapic so we get one or the
other.

( Might be fixable once we migrate the NMI watchdog code over to
perfcounters. )

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/