[PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions

From: Ulrich Obergfell
Date: Sat Aug 01 2015 - 08:46:19 EST


Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
only called in watchdog thread context. However, the following commits
utilize these functions outside of watchdog thread context too.

commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
Author: Michal Hocko <mhocko@xxxxxxx>
Date: Tue Sep 24 15:27:30 2013 -0700

watchdog: update watchdog_thresh properly

commit b3738d29323344da3017a91010530cf3a58590fc
Author: Stephane Eranian <eranian@xxxxxxxxxx>
Date: Mon Nov 17 20:07:03 2014 +0100

watchdog: Add watchdog enable/disable all functions

Hence, it is now possible that these functions execute concurrently with
the same 'cpu' argument. This concurrency is problematic because per-cpu
'watchdog_ev' can be accessed/modified without adequate synchronization.

The patch series aims to address the above problem. However, instead of
introducing locks to protect per-cpu 'watchdog_ev' a different approach
is taken: Invoke these functions by parking and unparking the watchdog
threads (to ensure they are always called in watchdog thread context).

static struct smp_hotplug_thread watchdog_threads = {
...
.park = watchdog_disable, // calls watchdog_nmi_disable()
.unpark = watchdog_enable, // calls watchdog_nmi_enable()
};

Both previously mentioned commits call these functions in a similar way
and thus in principle contain some duplicate code. The patch series also
avoids this duplication by providing a commonly usable mechanism.

- Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
are intended to be called inside of kernel/watchdog.c only.

- Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
be utilized by external callers to deactivate the hard and soft lockup
detector temporarily.

- Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.

- Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.

A few corner cases should be mentioned here for completeness.

- kthread_park() of watchdog/N could hang if cpu N is already locked up.
However, if watchdog is enabled the lockup will be detected anyway.

- kthread_unpark() of watchdog/N could hang if cpu N got locked up after
kthread_park(). The occurrence of this scenario should be _very_ rare
in practice, in particular because it is not expected that temporary
deactivation will happen frequently, and if it happens at all it is
expected that the duration of deactivation will be short.

Ulrich Obergfell (4):
watchdog: introduce watchdog_park_threads() and
watchdog_unpark_threads()
watchdog: introduce watchdog_suspend() and watchdog_resume()
watchdog: use park/unpark functions in update_watchdog_all_cpus()
watchdog: use suspend/resume interface in fixup_ht_bug()

arch/x86/kernel/cpu/perf_event_intel.c | 9 +-
include/linux/nmi.h | 2 +
include/linux/watchdog.h | 8 --
kernel/watchdog.c | 157 +++++++++++++++++++--------------
4 files changed, 100 insertions(+), 76 deletions(-)

--
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/