Re: Lockdep warning caused by "driver core: Fix sleeping in invalid context during device link deletion"
From: Saravana Kannan
Date: Mon Aug 31 2020 - 18:15:43 EST
On Wed, Aug 26, 2020 at 10:17 PM Saravana Kannan <saravanak@xxxxxxxxxx> wrote:
>
> On Thu, Aug 20, 2020 at 8:50 PM Dong Aisheng <dongas86@xxxxxxxxx> wrote:
> >
> > Hi ALL,
> >
> > We met the below WARNING during system suspend on an iMX6Q SDB board
> > with the latest linus/master branch (v5.9-rc1+) and next-20200820.
> > v5.8 kernel is ok. So i did bisect and finally found it's caused by
> > the patch below.
> > Reverting it can get rid of the warning, but I wonder if there may be
> > other potential issues.
> > Any ideas?
> >
> > Defconfig used is: imx_v6_v7_defconfig
> >
>
> ----- 8< ----- Snipped text that was a bit misleading
>
> >
> > Error log:
> > # echo mem > /sys/power/state
> > [ 39.111865] PM: suspend entry (deep)
> > [ 39.148650] Filesystems sync: 0.032 seconds
> > [ 39.154034]
> > [ 39.155537] ======================================================
> > [ 39.161723] WARNING: possible circular locking dependency detected
> > [ 39.167911] 5.9.0-rc1-00103-g7eac66d0456f #37 Not tainted
> > [ 39.173315] ------------------------------------------------------
> > [ 39.179500] sh/647 is trying to acquire lock:
> > [ 39.183862] c15a310c (dpm_list_mtx){+.+.}-{3:3}, at:
> > dpm_for_each_dev+0x20/0x5c
> > [ 39.191200]
> > [ 39.191200] but task is already holding lock:
> > [ 39.197036] c15a37e4 (fw_lock){+.+.}-{3:3}, at: fw_pm_notify+0x90/0xd4
> > [ 39.203582]
> > [ 39.203582] which lock already depends on the new lock.
> > [ 39.203582]
> > [ 39.211763]
> > [ 39.211763] the existing dependency chain (in reverse order) is:
> > [ 39.219249]
> > [ 39.219249] -> #2 (fw_lock){+.+.}-{3:3}:
> > [ 39.224673] mutex_lock_nested+0x1c/0x24
> > [ 39.229126] firmware_uevent+0x18/0xa0
> > [ 39.233411] dev_uevent+0xc4/0x1f8
> > [ 39.237343] uevent_show+0x98/0x114
> > [ 39.241362] dev_attr_show+0x18/0x48
> > [ 39.245472] sysfs_kf_seq_show+0x84/0xec
> > [ 39.249927] seq_read+0x138/0x550
> > [ 39.253774] vfs_read+0x94/0x164
> > [ 39.257529] ksys_read+0x60/0xe8
> > [ 39.261288] ret_fast_syscall+0x0/0x28
> > [ 39.265564] 0xbed7c808
> > [ 39.268538]
> > [ 39.268538] -> #1 (kn->active#3){++++}-{0:0}:
> > [ 39.274391] kernfs_remove_by_name_ns+0x40/0x94
> > [ 39.279450] device_del+0x144/0x3fc
>
> Rafael/Greg,
>
> I'm not very familiar with the #0 and #2 calls stacks. But poking
> around a bit, they are NOT due to the device-link-device. But the new
> stuff is the above two lines that are deleting the device-link-device
> (that's used to expose device link details in sysfs) when the device
> link is deleted.
>
> Kicking off a workqueue to break this cycle is easy, but the problem
> is that if I queue a work to delete the device, then the sysfs folder
> won't get removed immediately. And if the same link is created again
> before the work is completed, then there'll be a sysfs name collision
> and warning.
>
> So, I'm kinda stuck here. Open to suggestions. Hoping you'll have
> better ideas for breaking the cycle. Or point out how I'm
> misunderstanding the cycle here.
>
Aisheng,
Sent out a fix that I think should work.
https://lore.kernel.org/lkml/20200831221007.1506441-1-saravanak@xxxxxxxxxx/T/#u
I wasn't able to reproduce it in my hardware. So, if you can test that
patch (and respond to that thread), that'd be great.
-Saravana