On Mon, Mar 30, 2015 at 02:51:05PM -0400, cmetcalf@xxxxxxxxxx wrote:
From: Chris Metcalf <cmetcalf@xxxxxxxxxx>Hi Chris,
Running watchdog can be a helpful debugging feature on regular
cores, but it's incompatible with nohz_full, since it forces
regular scheduling events. Accordingly, just exit out immediately
from any nohz_full core.
An alternate approach would be to add a flags field or function to
smp_hotplug_thread to control on which cores the percpu threads
are created, but it wasn't clear that much mechanism was useful.
It seems like the correct solution would be to hook into the idle_loop
somehow. If the cpu is idle, then it seems unlikely that a lockup could
occur.
My fear with this apporach is a lockup would occur on the nohz cpu and it
would go undetected because that cpu is disabled. Further no printk is
thrown out to even indicate a cpu is disabled making it more difficult to
debug.
Signed-off-by: Chris Metcalf <cmetcalf@xxxxxxxxxx>
---
kernel/watchdog.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 3174bf8e3538..8a46d9d8a66f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -19,6 +19,7 @@
#include <linux/sysctl.h>
#include <linux/smpboot.h>
#include <linux/sched/rt.h>
+#include <linux/tick.h>
#include <asm/irq_regs.h>
#include <linux/kvm_para.h>
@@ -431,6 +432,10 @@ static void watchdog_enable(unsigned int cpu)
hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
hrtimer->function = watchdog_timer_fn;
+ /* nohz_full cpus do not do watchdog checking. */
+ if (tick_nohz_full_cpu(cpu))
+ do_exit(0);
+
/* Enable the perf event */
watchdog_nmi_enable(cpu);
--
2.1.2