[PATCH v2] watchdog: Add a sysctl to disable soft lockup detector
From: Ben Zhang
Date: Wed Dec 04 2013 - 20:56:44 EST
Currently, the soft lockup detector and hard lockup detector
can be enabled or disabled together via the flag variable
watchdog_user_enabled. There isn't a way to disable only the
soft lockup detector while keeping the hard lockup detector
running.
The hard lockup detector sometimes does not work on a x86
machine with multiple cpus when softlockup_panic is set to 0.
For example:
1. Hard lockup occurs on cpu0 ("cli" followed by a infinite loop).
2. Soft lockup occurs on cpu1 shortly after because cpu1 tries to
send a function to cpu0 via smp_call_function_single().
3. watchdog_timer_fn() detects the soft lockup on cpu1 and
dumps the stack. dump_stack() eventually calls touch_nmi_watchdog()
which sets watchdog_nmi_touch=true for all cpus and sets
watchdog_touch_ts=0 for cpu1.
4. NMI fires on cpu0. watchdog_overflow_callback() sees
watchdog_nmi_touch=true, so it does not do anything except setting
watchdog_nmi_touch=false.
5. watchdog_timer_fn() is called again on cpu1, it sees
watchdog_touch_ts=0, so reloads it with the current tick. Thus,
is_softlockup() returns false, and soft_watchdog_warn is set to false.
6. Before NMI can fire on cpu0 again with watchdog_nmi_touch=false,
watchdog_timer_fn() reports the soft lockup on cpu1 again
and we go back to #3.
The machine stays locked up and the log shows repeated reports of
soft lockup on cpu1. Therefore, we need a way to disable the soft
lockup check so that the hard lockup detector can reboot the machine.
* Existing boot options for the watchdog:
nmi_watchdog=panic/nopanic/0
softlockup_panic=0/1
nowatchdog
nosoftlockup
* Variables modified by the boot options:
int watchdog_user_enabled;
unsigned int softlockup_panic;
unsigned int hardlockup_panic;
* Existing sysctls at /proc/sys/kernel/... for the watchdog:
nmi_watchdog=0/1
watchdog=0/1
softlockup_panic=0/1
watchdog_thresh=0~60
* Variables modified by the sysctls:
int watchdog_user_enabled;
unsigned int softlockup_panic;
int watchdog_thresh;
This patch adds a new boot option softlockup_detector_enable
and a sysctl at /proc/sys/kernel/softlockup_detector_enable to
allow disabling only the soft lockup detector.
softlockup_detector_enable=1:
This is the default. The soft lockup detector is enabled.
When a soft lockup is detected, a warning message with
debug info is printed. The kernel may be configured to
panics in this case via the sysctl kernel.softlockup_panic.
softlockup_detector_enable=0:
The soft lockup detector is disabled. Warning message is
not printed on soft lockup. The kernel does not panic on
soft lockup regardless of the value of kernel.softlockup_panic.
Note kernel.softlockup_detector_enable does not affect
the hard lockup detector.
Signed-off-by: Ben Zhang <benzh@xxxxxxxxxxxx>
---
Documentation/kernel-parameters.txt | 11 +++++++++++
Documentation/sysctl/kernel.txt | 20 ++++++++++++++++++++
include/linux/sched.h | 3 ++-
kernel/sysctl.c | 9 +++++++++
kernel/watchdog.c | 15 +++++++++++++++
5 files changed, 57 insertions(+), 1 deletion(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 50680a5..5678ac3 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2980,6 +2980,17 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
1: Fast pin select (default)
2: ATC IRMode
+ softlockup_detector_enable=
+ [KNL] Should the soft-lockup detector be enabled. If
+ the soft-lockup detector is disabled, no warning
+ message is printed on soft lockup, and the kernel does
+ not panic on soft lockup regardless of the value of
+ softlockup_panic. softlockup_detector_enable does not
+ affect the hard lockup detector.
+ If this parameter is not present, the soft-lockup
+ detector is enabled by default.
+ Format: <integer>
+
softlockup_panic=
[KNL] Should the soft-lockup detector generate panics.
Format: <integer>
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 26b7ee4..209212e 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -70,6 +70,7 @@ show up in /proc/sys/kernel:
- shmall
- shmmax [ sysv ipc ]
- shmmni
+- softlockup_detector_enable
- stop-a [ SPARC only ]
- sysrq ==> Documentation/sysrq.txt
- tainted
@@ -718,6 +719,25 @@ without users and with a dead originative process will be destroyed.
==============================================================
+softlockup_detector_enable:
+
+Should the soft-lockup detector be enabled.
+
+softlockup_detector_enable=1:
+This is the default. The soft lockup detector is enabled.
+When a soft lockup is detected, a warning message with
+debug info is printed. The kernel may be configured to
+panics in this case via the sysctl kernel.softlockup_panic.
+
+softlockup_detector_enable=0:
+The soft lockup detector is disabled. Warning message is
+not printed on soft lockup. The kernel does not panic on
+soft lockup regardless of the value of kernel.softlockup_panic.
+Note kernel.softlockup_detector_enable does not affect
+the hard lockup detector.
+
+==============================================================
+
tainted:
Non-zero if the kernel has been tainted. Numeric values, which
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 768b037..6d3749d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -269,7 +269,8 @@ extern void touch_all_softlockup_watchdogs(void);
extern int proc_dowatchdog_thresh(struct ctl_table *table, int write,
void __user *buffer,
size_t *lenp, loff_t *ppos);
-extern unsigned int softlockup_panic;
+extern unsigned int softlockup_panic;
+extern unsigned int softlockup_detector_enable;
void lockup_detector_init(void);
#else
static inline void touch_softlockup_watchdog(void)
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 34a6047..8ae1f36 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -840,6 +840,15 @@ static struct ctl_table kern_table[] = {
.extra2 = &one,
},
{
+ .procname = "softlockup_detector_enable",
+ .data = &softlockup_detector_enable,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec_minmax,
+ .extra1 = &zero,
+ .extra2 = &one,
+ },
+ {
.procname = "nmi_watchdog",
.data = &watchdog_user_enabled,
.maxlen = sizeof (int),
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 4431610..b9594e6 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -80,6 +80,18 @@ static int __init softlockup_panic_setup(char *str)
}
__setup("softlockup_panic=", softlockup_panic_setup);
+unsigned int __read_mostly softlockup_detector_enable = 1;
+
+static int __init softlockup_detector_enable_setup(char *str)
+{
+ unsigned long res;
+ if (kstrtoul(str, 0, &res))
+ res = 1;
+ softlockup_detector_enable = res;
+ return 1;
+}
+__setup("softlockup_detector_enable=", softlockup_detector_enable_setup);
+
static int __init nowatchdog_setup(char *str)
{
watchdog_user_enabled = 0;
@@ -293,6 +305,9 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
return HRTIMER_RESTART;
}
+ if (!softlockup_detector_enable)
+ return HRTIMER_RESTART;
+
/* check for a softlockup
* This is done by making sure a high priority task is
* being scheduled. The task touches the watchdog to
--
1.8.5.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/