Re: [PATCH v2] [watchdog] combine nmi_watchdog and softlockup

From: Frederic Weisbecker
Date: Fri Apr 16 2010 - 10:43:13 EST


On Fri, Apr 16, 2010 at 10:12:13AM -0400, Don Zickus wrote:
> On Fri, Apr 16, 2010 at 03:47:14AM +0200, Frederic Weisbecker wrote:
> > > config PERF_EVENTS_NMI
> > > bool
> > > + depends on PERF_EVENTS
> > > help
> > > Arch has support for nmi_watchdog
> >
> >
> >
> > That looks too general. It's more about the fact the arch supports
> > cpu cycle events and generates NMIs on overflow.
>
> I was trying to figure out a way to add the PERF_EVENTS dependency as I
> didn't want to impose it on the CONFIG_NMI_WATCHDOG if that config
> supported softlockup (which doesn't need the PERF_EVENTS).



Yeah and this is fine. I was talking about the help description.



> > I'm confused, do we have two versions of the softlockup
> > detector now? You should drop the older one.
>
> Originally Ingo talked about a migration path, so I was going to support
> the older one in case the new one was having issues, sort of like what he
> suggested about moving the nmi code from arch/x86/kernel/apic/nmi.c to
> kernel/watchdog.c. But I can probably drop the softlockup case as the
> migration isn't as tricky as the nmi case.



Ok.

> > > + return;
> > > + }
> > > +
> > > + cpumask_clear_cpu(this_cpu, to_cpumask(hardlockup_mask));
> >
> >
> >
> > Hmm...this is probably not necessary.
>
> I was just thinking of the case where dispite the WARN above, the cpu
> actually recovered and then failed again separately. But I probably won't
> spend anymore time defending it. :-)



This is really just a corner case, I guess you don't need to
bother with that. It is actually racy against other cpus and adding
a spinlock here (in the everything is fine path) would be an overkill.

In fact, having two per cpu vars named hardlockup_warned and
softlockup_warned would be better than cpumasks. I'm sorry I
suggested you the cpumask, but such per cpu vars will avoid
you dealing with these synchonization issues. And one of the primary
rules is usually to never take a lock from NMIs if we can :)



> > You probably want a backtrace cpu mask here as well
> > (but better don't use the same than the hardlockup thing)
>
> yup.


So actually, per_cpu softlockup_warned would be better :)


> > Also you should half-drop the DETECT_SOFTLOCKUP thing:
> > keep it's definition but drop the ability to choose it from
> > the prompt:
> >
> > config DETECT_SOFTLOCKUP
> > bool
> > depends on DEBUG_KERNEL && !S390
> > default y
> >
> > This way we keep it for compatibility with def_configs, it will
> > enable the WATCHDOG by default if it is "y", we can schedule
> > its removal later.

> I understand the general idea but not quite the implementation idea. I will work
> on it and see what I come up with.


We current have:

config DETECT_SOFTLOCKUP
bool "Blah"
depends on DEBUG_KERNEL && !S390
default y
help
.......

The idea is to remove the "Blah" so that the user can't select it
anymore from make menuconfig, and to remove the help too as it's useless
too.

So that config WATCHDOG can be default y if DETECT_SOFTLOCKUP.
Then if someone comes with a config that has DETECT_SOFTLOCKUP,
it's new implementation (WATCHDOG) will enabled by default.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/