Re: w1: 3.14-rc7 - possible recursive locking detected

From: Dr. H. Nikolaus Schaller
Date: Mon Apr 14 2014 - 09:10:36 EST


Hi David,
I have tested and it appears to suppress the locking issue on our
GTA04 board.

BR,
Nikolaus

Am 14.04.2014 um 00:24 schrieb David Fries:

> Belisko Marek,
> Here is a possible solution, could you give it a try and report back?
>
> Greg Kroah-Hartman,
> Evgeniy asked me to look into this report. I don't have the
> reporter's hardware configuration, but I wouldn't think that would be
> needed, just some w1 bus master (even W1_MASTER_GPIO might work), then
> loading the slave device and manually adding a slave device with that
> family id. Even then I didn't reproduce the reported recursive
> locking error. I saw unrelated locking reports, but not this one. I
> wrote up the included patch, but that undoes the notify changes that
> you added earlier in commit 47eba33a0997fc7, and I wanted to ask about
> that commit. Specifically these two lines,
>
> err = device_register(&sl->dev);
> ...
> + dev_set_uevent_suppress(&sl->dev, false);
> + kobject_uevent(&sl->dev.kobj, KOBJ_ADD);
>
> Wouldn't the default be to not suppress? Nothing in the W1 system
> enables suppressing so is that even needed? (I'm fine with saying
> it's a good idea).
> device_register at some point must call device_add and device_add
> calls kobject_uevent(&dev->kobj, KOBJ_ADD);
> so doesn't the KOBJ_ADD send the add a second time? As in it
> shouldn't be needed?
> Can the suppress be called before device_register to avoid the
> automatic notify, then after it returns setup the slave device as this
> patch does to avoid this problem report, and then call the KOBJ_ADD to
> make everything happy?
>
> I'm based on commit ce7613db2d8d4d5af2587a.
>
> On Tue, Apr 08, 2014 at 09:47:50PM +0200, Belisko Marek wrote:
>> Hi,
>>
>> On Mon, Mar 24, 2014 at 1:01 PM, Evgeniy Polyakov <zbr@xxxxxxxxxxx> wrote:
>>> Hi everyone
>>>
>>> 24.03.2014, 01:50, "David Fries" <David@xxxxxxxxx>:
>>>> On Mon, Mar 17, 2014 at 10:38:13PM +0100, Belisko Marek wrote:
>>>>> booting latest 3.14-rc7 on gta04 board gives following warning:
>>>>>
>>>>> [ 3.101409] Driver for 1-wire Dallas network protocol.
>>>>> [ 3.109649] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>>>>> [ 3.400146] CPU0
>>>>> [ 3.402709] ----
>>>>> [ 3.405273] lock(&(&priv->bus_notifier)->rwsem);
>>>>> [ 3.410308] lock(&(&priv->bus_notifier)->rwsem);
>>>>> [ 3.415374]
>>>>> [ 3.415374] *** DEADLOCK ***
>>>>>
>>>>> AFAIK we didn't see that on (at least 3/14-rc2).
>>>>
>>>> The last drivers/w1 change went in before v3.14-rc1, so if it is
>>>> something in the w1 code, it has either been there or something else
>>>> in the rest of the kernel changed to expose it. I'm using the ds2490
>>>> USB w1 dongle, I didn't see that problem when enabling lock debugging
>>>> on 3.14.0-rc5+ which includes some changes I'm working on.
>>>>
>>>> https://github.com/dfries/linux.git w1_rework
>>>
>>> Can it be a bug in omap w1 driver?
>> We test with actual Linus master and it's even worse (it hangs completely).
>> Trace look little bit different and it only happens when booting via
>> device tree (not with board file).
>>
>> [ 2.807434] Driver for 1-wire Dallas network protocol.
>> [ 2.814117] omap_hdq omap_hdq: OMAP HDQ Hardware Rev 0.5. Driver in
>> Interrupt mode
>> [ 3.034271] call_modprobe: w1-family-0x1
>> [ 3.051818]
>> [ 3.053405] =============================================
>> [ 3.059051] [ INFO: possible recursive locking detected ]
>> [ 3.064727] 3.14.0-gta04 #556 Not tainted
>> [ 3.068939] ---------------------------------------------
>> [ 3.074615] w1_bus_master1/795 is trying to acquire lock:
>> [ 3.080261] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [ 3.090911]
>> [ 3.090911] but task is already holding lock:
>> [ 3.097045] (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [ 3.107666]
>> [ 3.107666] other info that might help us debug this:
>> [ 3.114501] Possible unsafe locking scenario:
>> [ 3.114501]
>> [ 3.120727] CPU0
>> [ 3.123291] ----
>> [ 3.125854] lock(&(&priv->bus_notifier)->rwsem);
>> [ 3.130889] lock(&(&priv->bus_notifier)->rwsem);
>> [ 3.135925]
>> [ 3.135925] *** DEADLOCK ***
>> [ 3.135925]
>> [ 3.142150] May be due to missing lock nesting notation
>> [ 3.142150]
>> [ 3.149291] 2 locks held by w1_bus_master1/795:
>> [ 3.154052] #0: (&dev->mutex#3){+.+.+.}, at: [<c03ba1c8>]
>> w1_attach_slave_device+0xc4/0x1c8
>> [ 3.163055] #1: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at:
>> [<c0058ab0>] __blocking_notifier_call_chain+0x28/0x58
>> [ 3.174163]
>> [ 3.174163] stack backtrace:
>> [ 3.178741] CPU: 0 PID: 795 Comm: w1_bus_master1 Not tainted
>> 3.14.0-gta04 #556
>> [ 3.186370] [<c0013ca0>] (unwind_backtrace) from [<c0010f68>]
>> (show_stack+0x10/0x14)
>> [ 3.194488] [<c0010f68>] (show_stack) from [<c04f0900>]
>> (dump_stack+0x6c/0x88)
>> [ 3.202117] [<c04f0900>] (dump_stack) from [<c00746bc>]
>> (print_deadlock_bug+0xc0/0xf0)
>> [ 3.210449] [<c00746bc>] (print_deadlock_bug) from [<c0075f80>]
>> (validate_chain.isra.29+0x4dc/0x534)
>> [ 3.220031] [<c0075f80>] (validate_chain.isra.29) from [<c0076dac>]
>> (__lock_acquire+0x728/0x834)
>> [ 3.229248] [<c0076dac>] (__lock_acquire) from [<c00774b8>]
>> (lock_acquire+0xf4/0x118)
>> [ 3.237487] [<c00774b8>] (lock_acquire) from [<c04f8e04>]
>> (down_read+0x24/0x34)
>> [ 3.245178] [<c04f8e04>] (down_read) from [<c0058ab0>]
>> (__blocking_notifier_call_chain+0x28/0x58)
>> [ 3.254516] [<c0058ab0>] (__blocking_notifier_call_chain) from
>> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
>> [ 3.265563] [<c0058af4>] (blocking_notifier_call_chain) from
>> [<c0301db8>] (device_add+0x1f4/0x368)
>> [ 3.274993] [<c0301db8>] (device_add) from [<c03055cc>]
>> (platform_device_add+0x138/0x1c8)
>> [ 3.283569] [<c03055cc>] (platform_device_add) from [<c03bd124>]
>> (w1_bq27000_add_slave+0x44/0x74)
>> [ 3.292907] [<c03bd124>] (w1_bq27000_add_slave) from [<c03b9ed8>]
>> (w1_bus_notify.part.0+0x48/0xc8)
>> [ 3.302337] [<c03b9ed8>] (w1_bus_notify.part.0) from [<c04fbe1c>]
>> (notifier_call_chain+0x38/0x68)
>> [ 3.311645] [<c04fbe1c>] (notifier_call_chain) from [<c0058acc>]
>> (__blocking_notifier_call_chain+0x44/0x58)
>> [ 3.321899] [<c0058acc>] (__blocking_notifier_call_chain) from
>> [<c0058af4>] (blocking_notifier_call_chain+0x14/0x18)
>> [ 3.332946] [<c0058af4>] (blocking_notifier_call_chain) from
>> [<c0301db8>] (device_add+0x1f4/0x368)
>> [ 3.342346] [<c0301db8>] (device_add) from [<c03b9ad0>]
>> (__w1_attach_slave_device+0x9c/0x134)
>> [ 3.351318] [<c03b9ad0>] (__w1_attach_slave_device) from
>> [<c03ba238>] (w1_attach_slave_device+0x134/0x1c8)
>> [ 3.361480] [<c03ba238>] (w1_attach_slave_device) from [<c03ba348>]
>> (w1_slave_found+0x7c/0x98)
>> [ 3.370513] [<c03ba348>] (w1_slave_found) from [<c03bc818>]
>> (omap_w1_search_bus+0x54/0x5c)
>> [ 3.379211] [<c03bc818>] (omap_w1_search_bus) from [<c03bc758>]
>> (w1_search_devices+0x3c/0x48)
>> [ 3.388153] [<c03bc758>] (w1_search_devices) from [<c03baaec>]
>> (w1_search_process_cb+0x64/0x108)
>> [ 3.397399] [<c03baaec>] (w1_search_process_cb) from [<c03bac60>]
>> (w1_process+0x68/0x14c)
>> [ 3.405975] [<c03bac60>] (w1_process) from [<c0054bdc>] (kthread+0xdc/0xf0)
>> [ 3.413299] [<c0054bdc>] (kthread) from [<c000dca8>]
>> (ret_from_fork+0x14/0x2c)
>> [ 24.451049] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=2102 jiffies, g=4294967086, c=4294967085, q=19)
>> [ 24.463134] INFO: Stall ended before state dump start
>> [ 87.501037] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=8407 jiffies, g=4294967086, c=4294967085, q=19)
>> [ 87.513122] INFO: Stall ended before state dump start
>> [ 150.551055] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=14712 jiffies, g=4294967086, c=4294967085, q=19)
>> [ 150.563232] INFO: Stall ended before state dump start
>> [ 213.601043] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=21017 jiffies, g=4294967086, c=4294967085, q=19)
>> [ 213.613220] INFO: Stall ended before state dump start
>> [ 276.651062] INFO: rcu_sched detected stalls on CPUs/tasks: {}
>> (detected by 0, t=27322 jiffies, g=4294967086, c=4294967085, q=19)
>> [ 276.663238] INFO: Stall ended before state dump start
>>
>> BR,
>>
>> marek
>>
>> --
>> as simple and primitive as possible
>> -------------------------------------------------
>> Marek Belisko - OPEN-NANDRA
>> Freelance Developer
>>
>> Ruska Nova Ves 219 | Presov, 08005 Slovak Republic
>> Tel: +421 915 052 184
>> skype: marekwhite
>> twitter: #opennandra
>> web: http://open-nandra.com
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>> Please read the FAQ at http://www.tux.org/lkml/
>
> --
> David Fries <david@xxxxxxxxx> PGP pub CB1EE8F0
> http://fries.net/~david/
>
>
> commit b3afc24a3ca6c261a8fe2c08b5bbccb51233c09a
> Author: David Fries <David@xxxxxxxxx>
> Date: Sun Apr 13 15:36:45 2014 -0500
>
> w1: avoid recursive device_add
>
> __w1_attach_slave_device calls device_add which calls w1_bus_notify
> which calls for the w1_bq27000 slave driver, which calls
> platform_device_add and device_add and deadlocks on getting
> &(&priv->bus_notifier)->rwsem as it is still held in the previous
> device_add. This avoids the problem by processing the family
> add/remove outside of the slave device_add call. This does mean that
> the slave device isn't setup when the notification that a slave is
> available goes out.
>
> Reported-by: Belisko Marek <marek.belisko@xxxxxxxxx>
>
> diff --git a/drivers/w1/w1.c b/drivers/w1/w1.c
> index b96f61b..dde09c1 100644
> --- a/drivers/w1/w1.c
> +++ b/drivers/w1/w1.c
> @@ -614,27 +614,11 @@ end:
> return err;
> }
>
> -/*
> - * Handle sysfs file creation and removal here, before userspace is told that
> - * the device is added / removed from the system
> - */
> -static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> - void *data)
> +static int w1_family_notify(unsigned long action, struct w1_slave *sl)
> {
> - struct device *dev = data;
> - struct w1_slave *sl;
> struct w1_family_ops *fops;
> int err;
>
> - /*
> - * Only care about slave devices at the moment. Yes, we should use a
> - * separate "type" for this, but for now, look at the release function
> - * to know which type it is...
> - */
> - if (dev->release != w1_slave_release)
> - return 0;
> -
> - sl = dev_to_w1_slave(dev);
> fops = sl->family->fops;
>
> if (!fops)
> @@ -673,10 +657,6 @@ static int w1_bus_notify(struct notifier_block *nb, unsigned long action,
> return 0;
> }
>
> -static struct notifier_block w1_bus_nb = {
> - .notifier_call = w1_bus_notify,
> -};
> -
> static int __w1_attach_slave_device(struct w1_slave *sl)
> {
> int err;
> @@ -705,6 +685,7 @@ static int __w1_attach_slave_device(struct w1_slave *sl)
> dev_name(&sl->dev), err);
> return err;
> }
> + w1_family_notify(BUS_NOTIFY_ADD_DEVICE, sl);
>
>
> dev_set_uevent_suppress(&sl->dev, false);
> @@ -799,6 +780,7 @@ int w1_unref_slave(struct w1_slave *sl)
> msg.type = W1_SLAVE_REMOVE;
> w1_netlink_send(sl->master, &msg);
>
> + w1_family_notify(BUS_NOTIFY_DEL_DEVICE, sl);
> device_unregister(&sl->dev);
> #ifdef DEBUG
> memset(sl, 0, sizeof(*sl));
> @@ -1186,10 +1168,6 @@ static int __init w1_init(void)
> goto err_out_exit_init;
> }
>
> - retval = bus_register_notifier(&w1_bus_type, &w1_bus_nb);
> - if (retval)
> - goto err_out_bus_unregister;
> -
> retval = driver_register(&w1_master_driver);
> if (retval) {
> printk(KERN_ERR

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/