Re: [PATCH] watchdog: nohz: don't run watchdog on nohz_full cores

From: Don Zickus
Date: Thu Apr 02 2015 - 10:16:15 EST

On Thu, Apr 02, 2015 at 09:49:45AM -0400, Chris Metcalf wrote:
> >Can I ask how the NO_HZ_FULL technology works from userspace? Is there a
> >system command that has to be sent? How does the kernel know to turn off
> >ticks and trust userspace to do the right thing?
> The NO_HZ_FULL option, when configured into the kernel, lets
> you boot with "nohz_full=1-15" (or whatever cpumask you like),
> typically in conjunction with "isolcpus=1-15". At this point no tasks
> will run on those cores until explicitly placed there by affinity, and
> once there and running in userspace, the kernel will automatically
> get out of their way and not interrupt at all. This lets those tasks
> run with 100.000% of the cpu, which is a requirement for many
> user-space device drivers running high throughput devices.
> (This is typically the use case for the tile architecture customers.)
> So, other than a boot flag, there are no system commands or
> other APIs to deal with.

Ah, I am starting to understand your approach in the original patch better.

> Part of the requirement, though, is that there can be only one task
> bound and runnable on that cpu, otherwise the kernel has to be
> involved to do the context-switching off of the scheduler tick.
> This is why having the standard watchdog kernel thread doesn't
> work in this context.

So, there is no preemption happening, which means the softlockup is rather
pointless. Can interrupts be disabled or handled on that cpu? I am trying
to see if the hardlockup detector becomes rather silly on those cpus too.

> I continue to suspect that the right model here is to disable the
> watchdog specifically on the cores that the user has tagged with
> the nohz_full boot argument. I agree that there might be a case
> to be made for leaving the watchdog conditionally (as suggested
> by Ingo) but it should be possible to have the watchdogs on
> the nohz_full cores be turned off completely if desired.

I think I might be slowly coming around to your thoughts. I might request a
different patch though based on the answers above. Maybe even create a
subset of the online cpus for the watchdog to work off of. The watchdog
would copy the online cpu mask, mask off the nohz cpus and just function
that way. It would print loud messages for each nohz cpu it was masking

Then perhaps as a debug aid, expose a /proc/sys/kernel/watchdog_cpumask for
folks to modify in case they want to enable the watchdog on the nohz cpus.

Just some thoughts.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at