RE: [PATCH] mei, release device lock when registering watchdog

From: Winkler, Tomas
Date: Thu Mar 20 2014 - 09:11:06 EST




> I don't know if this also shows a possible latent bug with lockdep. It
> seems like I should be seeing this bug all the time with kernels 3.6.6 and
> onward. Testing indicates that the bug is present in 3.11, and 3.12 but
> for some reason the reporting of the bug comes-and-goes as patches are
> added and removed to the kernel...
>
> P.


Yep, this is solved in mainline now, we need some fix for older kernels, probably also pushing this lock lift to mei_watchdog_register(dev);


/* unlock to preserve correct locking order */
mutex_unlock(&dev->device_lock);
ret = watchdog_register_device(&amt_wd_dev);
mutex_lock(&dev->device_lock);

> ---8<---
>
> Seen in kernels,
>
> 3.6.6: http://lkml.org/lkml/2012/11/13/184
> 3.10.0: where I have seen it (and what led to this patch)
> 3.11: seen in testing (boot & cat /dev/watchdog, OR, reboot)
> 3.12: seen in testing (boot & cat /dev/watchdog, OR, reboot)
> 3.13.0: https://lists.fedoraproject.org/pipermail/test/2013-May/115423.html
> 3.14.0: I can occasionally reproduce this by issuing a reboot.
>
> The lockdep info is:
>
> [ 42.016059] ======================================================
> [ 42.022222] [ INFO: possible circular locking dependency detected ]
> [ 42.028471] 3.13.0 #1 Not tainted
> [ 42.031775] -------------------------------------------------------
> [ 42.038025] cat/2175 is trying to acquire lock:
> [ 42.042543] (&dev->device_lock){+.+.+.}, at: [<ffffffff813fcd0d>]
> mei_wd_ops_start+0x2d/0xf0
> [ 42.051100]
> [ 42.051100] but task is already holding lock:
> [ 42.056917] (&wdd->lock){+.+...}, at: [<ffffffff81494f52>]
> watchdog_start+0x22/0x80
> [ 42.064687]
> [ 42.064687] which lock already depends on the new lock.
> [ 42.064687]
> [ 42.072841]
> [ 42.072841] the existing dependency chain (in reverse order) is:
> [ 42.080302]
> -> #2 (&wdd->lock){+.+...}:
> [ 42.084353] [<ffffffff810b8297>] validate_chain.isra.41+0x4a7/0x910
> [ 42.091222] [<ffffffff810b9566>] __lock_acquire+0x3c6/0xb60
> [ 42.097398] [<ffffffff810ba4f2>] lock_acquire+0xa2/0x120
> [ 42.103312] [<ffffffff8161855d>] mutex_lock_nested+0x5d/0x4c0
> [ 42.109663] [<ffffffff81494f52>] watchdog_start+0x22/0x80
> [ 42.115665] [<ffffffff814951c8>] watchdog_open+0x98/0xf0
> [ 42.121582] [<ffffffff813c0e03>] misc_open+0xb3/0x170
> [ 42.127240] [<ffffffff811d0baf>] chrdev_open+0x9f/0x1c0
> [ 42.133069] [<ffffffff811c96a7>] do_dentry_open+0x1d7/0x2f0
> [ 42.139245] [<ffffffff811c99b1>] finish_open+0x31/0x40
> [ 42.144986] [<ffffffff811dc157>] do_last+0xbd7/0x13c0
> [ 42.150642] [<ffffffff811dc9fe>] path_openat+0xbe/0x640
> [ 42.156468] [<ffffffff811dd79a>] do_filp_open+0x3a/0x90
> [ 42.162298] [<ffffffff811cb248>] do_sys_open+0x128/0x220
> [ 42.168214] [<ffffffff811cb35e>] SyS_open+0x1e/0x20
> [ 42.173695] [<ffffffff816256e9>] system_call_fastpath+0x16/0x1b
> [ 42.180217]
> -> #1 (misc_mtx){+.+.+.}:
> [ 42.184088] [<ffffffff810b8297>] validate_chain.isra.41+0x4a7/0x910
> [ 42.190957] [<ffffffff810b9566>] __lock_acquire+0x3c6/0xb60
> [ 42.197134] [<ffffffff810ba4f2>] lock_acquire+0xa2/0x120
> [ 42.203048] [<ffffffff8161855d>] mutex_lock_nested+0x5d/0x4c0
> [ 42.209396] [<ffffffff813c0b69>] misc_register+0x29/0x120
> [ 42.215399] [<ffffffff814956c5>] watchdog_dev_register+0x35/0x120
> [ 42.222093] [<ffffffff81494d0c>] watchdog_register_device+0xac/0x1f0
> [ 42.229046] [<ffffffff813fd3b9>] mei_watchdog_register+0x19/0x70
> [ 42.235651] [<ffffffff813f6729>] mei_hbm_dispatch+0x269/0xc80
> [ 42.241999] [<ffffffff813f72e4>] mei_irq_read_handler+0xe4/0x900
> [ 42.248609] [<ffffffff813ffee5>] mei_me_irq_thread_handler+0x115/0x310
> [ 42.255735] [<ffffffff810cbb50>] irq_thread_fn+0x20/0x50
> [ 42.261653] [<ffffffff810cbfb3>] irq_thread+0x123/0x150
> [ 42.267482] [<ffffffff8108d90f>] kthread+0xff/0x120
> [ 42.272966] [<ffffffff8162563c>] ret_from_fork+0x7c/0xb0
> [ 42.278881]
> -> #0 (&dev->device_lock){+.+.+.}:
> [ 42.283536] [<ffffffff810b7de7>] check_prevs_add+0x967/0x970
> [ 42.289801] [<ffffffff810b8297>] validate_chain.isra.41+0x4a7/0x910
> [ 42.296670] [<ffffffff810b9566>] __lock_acquire+0x3c6/0xb60
> [ 42.302843] [<ffffffff810ba4f2>] lock_acquire+0xa2/0x120
> [ 42.308760] [<ffffffff8161855d>] mutex_lock_nested+0x5d/0x4c0
> [ 42.315108] [<ffffffff813fcd0d>] mei_wd_ops_start+0x2d/0xf0
> [ 42.321281] [<ffffffff81494f8d>] watchdog_start+0x5d/0x80
> [ 42.327282] [<ffffffff814951c8>] watchdog_open+0x98/0xf0
> [ 42.333194] [<ffffffff813c0e03>] misc_open+0xb3/0x170
> [ 42.338851] [<ffffffff811d0baf>] chrdev_open+0x9f/0x1c0
> [ 42.344681] [<ffffffff811c96a7>] do_dentry_open+0x1d7/0x2f0
> [ 42.350855] [<ffffffff811c99b1>] finish_open+0x31/0x40
> [ 42.356597] [<ffffffff811dc157>] do_last+0xbd7/0x13c0
> [ 42.362255] [<ffffffff811dc9fe>] path_openat+0xbe/0x640
> [ 42.368084] [<ffffffff811dd79a>] do_filp_open+0x3a/0x90
> [ 42.373911] [<ffffffff811cb248>] do_sys_open+0x128/0x220
> [ 42.379824] [<ffffffff811cb35e>] SyS_open+0x1e/0x20
> [ 42.385305] [<ffffffff816256e9>] system_call_fastpath+0x16/0x1b
> [ 42.391826]
> [ 42.391826] other info that might help us debug this:
> [ 42.391826]
> [ 42.399808] Chain exists of:
> &dev->device_lock --> misc_mtx --> &wdd->lock
>
> [ 42.407170] Possible unsafe locking scenario:
> [ 42.407170]
> [ 42.413073] CPU0 CPU1
> [ 42.417588] ---- ----
> [ 42.422105] lock(&wdd->lock);
> [ 42.425259] lock(misc_mtx);
> [ 42.430752] lock(&wdd->lock);
> [ 42.436420] lock(&dev->device_lock);
> [ 42.440178]
> [ 42.440178] *** DEADLOCK ***
> [ 42.440178]
> [ 42.446081] 2 locks held by cat/2175:
> [ 42.449732] #0: (misc_mtx){+.+.+.}, at: [<ffffffff813c0d7e>]
> misc_open+0x2e/0x170
> [ 42.457430] #1: (&wdd->lock){+.+...}, at: [<ffffffff81494f52>]
> watchdog_start+0x22/0x80
> [ 42.465647]
> [ 42.465647] stack backtrace:
> [ 42.469992] CPU: 7 PID: 2175 Comm: cat Not tainted 3.13.0 #1
> [ 42.475636] Hardware name: Intel Corporation 2012 Client Platform/Emerald
> Lake 2, BIOS ACRVMBY1.86C.0078.P00.1201161002 01/16/2012
> [ 42.487340] ffffffff822b2a40 ffff880036475918 ffffffff81613aa0
> ffffffff822ca9e0
> [ 42.494763] ffff880036475958 ffffffff8160defb ffff880036475980
> 0000000000000001
> [ 42.502186] ffff88023af4e400 ffff88023af4ef40 ffff88023af4ef40
> 0000000000000002
> [ 42.509609] Call Trace:
> [ 42.512048] [<ffffffff81613aa0>] dump_stack+0x45/0x56
> [ 42.517172] [<ffffffff8160defb>] print_circular_bug+0x1f9/0x207
> [ 42.523161] [<ffffffff810b7de7>] check_prevs_add+0x967/0x970
> [ 42.528893] [<ffffffff810b7952>] ? check_prevs_add+0x4d2/0x970
> [ 42.534797] [<ffffffff810b8297>] validate_chain.isra.41+0x4a7/0x910
> [ 42.541132] [<ffffffff810b9566>] __lock_acquire+0x3c6/0xb60
> [ 42.546776] [<ffffffff810b9566>] ? __lock_acquire+0x3c6/0xb60
> [ 42.552591] [<ffffffff810ba4f2>] lock_acquire+0xa2/0x120
> [ 42.557974] [<ffffffff813fcd0d>] ? mei_wd_ops_start+0x2d/0xf0
> [ 42.563790] [<ffffffff8161855d>] mutex_lock_nested+0x5d/0x4c0
> [ 42.569605] [<ffffffff813fcd0d>] ? mei_wd_ops_start+0x2d/0xf0
> [ 42.575421] [<ffffffff810b8dbd>] ? trace_hardirqs_on+0xd/0x10
> [ 42.581235] [<ffffffff813fcd0d>] ? mei_wd_ops_start+0x2d/0xf0
> [ 42.587052] [<ffffffff81494f52>] ? watchdog_start+0x22/0x80
> [ 42.592697] [<ffffffff81494f52>] ? watchdog_start+0x22/0x80
> [ 42.598341] [<ffffffff813fcd0d>] mei_wd_ops_start+0x2d/0xf0
> [ 42.603984] [<ffffffff81494f8d>] watchdog_start+0x5d/0x80
> [ 42.609453] [<ffffffff814951c8>] watchdog_open+0x98/0xf0
> [ 42.614836] [<ffffffff813c0e03>] misc_open+0xb3/0x170
> [ 42.619961] [<ffffffff811d0baf>] chrdev_open+0x9f/0x1c0
> [ 42.625257] [<ffffffff811c96a7>] do_dentry_open+0x1d7/0x2f0
> [ 42.630901] [<ffffffff811d7e62>] ? __inode_permission+0x52/0xc0
> [ 42.636892] [<ffffffff811d0b10>] ? cdev_put+0x30/0x30
> [ 42.642016] [<ffffffff811c99b1>] finish_open+0x31/0x40
> [ 42.647226] [<ffffffff811dc157>] do_last+0xbd7/0x13c0
> [ 42.652349] [<ffffffff811d7ee8>] ? inode_permission+0x18/0x50
> [ 42.658165] [<ffffffff811d8631>] ? link_path_walk+0x71/0x850
> [ 42.663894] [<ffffffff811dc9fe>] path_openat+0xbe/0x640
> [ 42.669190] [<ffffffff810b9566>] ? __lock_acquire+0x3c6/0xb60
> [ 42.675005] [<ffffffff810a0a15>] ? sched_clock_cpu+0xb5/0x100
> [ 42.680822] [<ffffffff811dd79a>] do_filp_open+0x3a/0x90
> [ 42.686118] [<ffffffff8161c187>] ? _raw_spin_unlock+0x27/0x30
> [ 42.691934] [<ffffffff811eb20f>] ? __alloc_fd+0xaf/0x140
> [ 42.697318] [<ffffffff811cb248>] do_sys_open+0x128/0x220
> [ 42.702700] [<ffffffff811cb35e>] SyS_open+0x1e/0x20
> [ 42.707652] [<ffffffff816256e9>] system_call_fastpath+0x16/0x1b
>
> The mei_watchdog_register() only registers the watchdog.
> mei_watchdog_unregister() is called outside of the dev->device_lock so it
> looks like mei_watchdog_register() can as well.
>
> Signed-off-by: Prarit Bhargava <prarit@xxxxxxxxxx>
> Cc: Tomas Winkler <tomas.winkler@xxxxxxxxx>
> ---
> drivers/misc/mei/hbm.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/drivers/misc/mei/hbm.c b/drivers/misc/mei/hbm.c
> index 28cd74c..fdc83ff 100644
> --- a/drivers/misc/mei/hbm.c
> +++ b/drivers/misc/mei/hbm.c
> @@ -491,7 +491,9 @@ static void mei_hbm_cl_connect_res(struct mei_device
> *dev,
>
> if (is_treat_specially_client(&dev->wd_cl, rs)) {
> dev_dbg(&dev->pdev->dev, "successfully connected to WD
> client.\n");
> + mutex_unlock(&dev->device_lock);
> mei_watchdog_register(dev);
> + mutex_lock(&dev->device_lock);
>
> return;
> }
> --
> 1.7.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/