Re: [PATCH 2/3] watchdog: control hard lockup detection default

From: Ulrich Obergfell
Date: Fri Jul 25 2014 - 04:33:06 EST


> ----- Original Message -----
> From: "Andrew Jones" <drjones@xxxxxxxxxx>
> To: linux-kernel@xxxxxxxxxxxxxxx, kvm@xxxxxxxxxxxxxxx
> Cc: uobergfe@xxxxxxxxxx, dzickus@xxxxxxxxxx, pbonzini@xxxxxxxxxx, akpm@xxxxxxxxxxxxxxxxxxxx, mingo@xxxxxxxxxx
> Sent: Thursday, July 24, 2014 12:13:30 PM
> Subject: [PATCH 2/3] watchdog: control hard lockup detection default

[...]

> The running kernel still has the ability to enable/disable at any
> time with /proc/sys/kernel/nmi_watchdog us usual. However even
> when the default has been overridden /proc/sys/kernel/nmi_watchdog
> will initially show '1'. To truly turn it on one must disable/enable
> it, i.e.
> echo 0 > /proc/sys/kernel/nmi_watchdog
> echo 1 > /proc/sys/kernel/nmi_watchdog

[...]

> @@ -626,15 +665,17 @@ int proc_dowatchdog(struct ctl_table *table, int write,
> * disabled. The 'watchdog_running' variable check in
> * watchdog_*_all_cpus() function takes care of this.
> */
> - if (watchdog_user_enabled && watchdog_thresh)
> + if (watchdog_user_enabled && watchdog_thresh) {
> + watchdog_enable_hardlockup_detector(true);
> err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
> - else
> + } else

[...]


I just realized a possible issue in the above part of the patch:

If we would want to give the user the option to override the effect of patch 3/3
via /proc, I think proc_dowatchdog() should enable hard lockup detection _only_
in case of a state transition from 'NOT watchdog_running' to 'watchdog_running'.
|
if (watchdog_user_enabled && watchdog_thresh) { | need to add this
if (!watchdog_running) <---------------------------'
watchdog_enable_hardlockup_detector(true);
err = watchdog_enable_all_cpus(old_thresh != watchdog_thresh);
} else
...

The additional 'if (!watchdog_running)' would _require_ the user to perform the
sequence of commands

echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog

to enable hard lockup detection explicitly.

I think changing the 'watchdog_thresh' while 'watchdog_running' is true should
_not_ enable hard lockup detection as a side-effect, because a user may have a
'sysctl.conf' entry such as

kernel.watchdog_thresh = ...

or may only want to change the 'watchdog_thresh' on the fly.

I think the following flow of execution could cause such undesired side-effect.

proc_dowatchdog
if (watchdog_user_enabled && watchdog_thresh) {

watchdog_enable_hardlockup_detector
hardlockup_detector_enabled = true

watchdog_enable_all_cpus
if (!watchdog_running) {
...
} else if (sample_period_changed)
update_timers_all_cpus
for_each_online_cpu
update_timers
watchdog_nmi_disable
...
watchdog_nmi_enable

watchdog_hardlockup_detector_is_enabled
return true

enable perf counter for hard lockup detection

Regards,

Uli
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/