Re: [PATCH v2] watchdog: nohz: don't run watchdog on nohz_full cores

From: Frederic Weisbecker
Date: Thu Apr 02 2015 - 11:20:00 EST

On Mon, Mar 30, 2015 at 04:02:06PM -0400, Don Zickus wrote:
> On Mon, Mar 30, 2015 at 03:32:55PM -0400, Chris Metcalf wrote:
> > On 03/30/2015 03:12 PM, Don Zickus wrote:
> > >On Mon, Mar 30, 2015 at 02:51:05PM -0400, cmetcalf@xxxxxxxxxx wrote:
> > >>From: Chris Metcalf <cmetcalf@xxxxxxxxxx>
> > >>
> > >>Running watchdog can be a helpful debugging feature on regular
> > >>cores, but it's incompatible with nohz_full, since it forces
> > >>regular scheduling events. Accordingly, just exit out immediately
> > >>from any nohz_full core.
> > >>
> > >>An alternate approach would be to add a flags field or function to
> > >>smp_hotplug_thread to control on which cores the percpu threads
> > >>are created, but it wasn't clear that much mechanism was useful.
> > >Hi Chris,
> > >
> > >It seems like the correct solution would be to hook into the idle_loop
> > >somehow. If the cpu is idle, then it seems unlikely that a lockup could
> > >occur.
> >
> > With nohz_full, though, the cpu might be running userspace code
> > with the intention of keeping kernel ticks disabled. Even returning
> > to kernel mode to try to figure out if we "should" be running the
> > watchdog on a given core will induce exactly the kind of interrupts
> > that nohz_full is designed to prevent.
> >
> > My assumption is generally that nohz_full cores don't spend a lot of
> > time in the kernel anyway, as they are optimized for user space.
> >
> > I guess you could imagine doing something per-cpu on the nohz_full
> > cores where we effectively call watchdog_disable() whenever a
> > nohz_full core enters userspace, and watchdog_enable() whenever it
> > enters the kernel. We could add some per-cpu state in the watchdog
> > code to track whether that core was currently enabled or disabled
> > to avoid double-enabling or double-disabling. I would think
> > context_tracking_user_exit()/_enter() would be the place to do this.
> >
> > This feels like a lot of overhead, potentially. Thoughts?
> A few months ago I might have thought that a reasonable approach. But
> recently we have added code to make the watchdog an all or nothing approach
> across the system. This might make it difficult to do what you are
> suggesting.
> I do not know enough about the nohz code to know what the right approach is
> here. Perhaps Federic can enlighten me?

Well, cancelling/rearming a timer on every userspace round trip sounds way too
much overhead to me :-)

But Ingo's suggestion to disable it properly (only on nohz full core) looks good.
And we should be able to re-enable it everywhere with "sysctl -w kernel.watchdog=1"
and you need to warn about this on boot.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at