Re: [PATCH 0/4] watchdog: avoid races in watchdog_nmi_{en|disable} functions

From: Jiri Olsa
Date: Sun Aug 02 2015 - 18:14:40 EST


On Sat, Aug 01, 2015 at 02:49:21PM +0200, Ulrich Obergfell wrote:
> Originally watchdog_nmi_enable(cpu) and watchdog_nmi_disable(cpu) were
> only called in watchdog thread context. However, the following commits
> utilize these functions outside of watchdog thread context too.
>
> commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca
> Author: Michal Hocko <mhocko@xxxxxxx>
> Date: Tue Sep 24 15:27:30 2013 -0700
>
> watchdog: update watchdog_thresh properly
>
> commit b3738d29323344da3017a91010530cf3a58590fc
> Author: Stephane Eranian <eranian@xxxxxxxxxx>
> Date: Mon Nov 17 20:07:03 2014 +0100
>
> watchdog: Add watchdog enable/disable all functions
>
> Hence, it is now possible that these functions execute concurrently with
> the same 'cpu' argument. This concurrency is problematic because per-cpu
> 'watchdog_ev' can be accessed/modified without adequate synchronization.
>
> The patch series aims to address the above problem. However, instead of
> introducing locks to protect per-cpu 'watchdog_ev' a different approach
> is taken: Invoke these functions by parking and unparking the watchdog
> threads (to ensure they are always called in watchdog thread context).
>
> static struct smp_hotplug_thread watchdog_threads = {
> ...
> .park = watchdog_disable, // calls watchdog_nmi_disable()
> .unpark = watchdog_enable, // calls watchdog_nmi_enable()
> };
>
> Both previously mentioned commits call these functions in a similar way
> and thus in principle contain some duplicate code. The patch series also
> avoids this duplication by providing a commonly usable mechanism.
>
> - Patch 1/4 introduces the watchdog_{park|unpark}_threads functions that
> park/unpark all watchdog threads specified in 'watchdog_cpumask'. They
> are intended to be called inside of kernel/watchdog.c only.
>
> - Patch 2/4 introduces the watchdog_{suspend|resume} functions which can
> be utilized by external callers to deactivate the hard and soft lockup
> detector temporarily.
>
> - Patch 3/4 utilizes watchdog_{park|unpark}_threads to replace some code
> that was introduced by commit 9809b18fcf6b8d8ec4d3643677345907e6b50eca.
>
> - Patch 4/4 utilizes watchdog_{suspend|resume} to replace some code that
> was introduced by commit b3738d29323344da3017a91010530cf3a58590fc.
>
> A few corner cases should be mentioned here for completeness.
>
> - kthread_park() of watchdog/N could hang if cpu N is already locked up.
> However, if watchdog is enabled the lockup will be detected anyway.
>
> - kthread_unpark() of watchdog/N could hang if cpu N got locked up after
> kthread_park(). The occurrence of this scenario should be _very_ rare
> in practice, in particular because it is not expected that temporary
> deactivation will happen frequently, and if it happens at all it is
> expected that the duration of deactivation will be short.
>
> Ulrich Obergfell (4):
> watchdog: introduce watchdog_park_threads() and
> watchdog_unpark_threads()
> watchdog: introduce watchdog_suspend() and watchdog_resume()
> watchdog: use park/unpark functions in update_watchdog_all_cpus()
> watchdog: use suspend/resume interface in fixup_ht_bug()

tested the same way as for I did for my other patch for this issue:
http://marc.info/?l=linux-kernel&m=143834383400803&w=2

works nicely ;-)

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/