Re: [PATCH v1] driver core: Fix scheduling while atomic warnings during device link deletion

From: Marek Szyprowski
Date: Thu Jul 16 2020 - 01:48:50 EST


Hi

On 16.07.2020 07:30, Guenter Roeck wrote:
> On 7/15/20 10:08 PM, Saravana Kannan wrote:
>> Marek and Guenter reported that commit 287905e68dd2 ("driver core:
>> Expose device link details in sysfs") caused sleeping/scheduling while
>> atomic warnings.
>>
>> BUG: sleeping function called from invalid context at kernel/locking/mutex.c:935
>> in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 12, name: kworker/0:1
>> 2 locks held by kworker/0:1/12:
>> #0: ee8074a8 ((wq_completion)rcu_gp){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
>> #1: ee921f20 ((work_completion)(&sdp->work)){+.+.}-{0:0}, at: process_one_work+0x174/0x7dc
>> Preemption disabled at:
>> [<c01b10f0>] srcu_invoke_callbacks+0xc0/0x154
>> ----- 8< ----- SNIP
>> [<c064590c>] (device_del) from [<c0645c9c>] (device_unregister+0x24/0x64)
>> [<c0645c9c>] (device_unregister) from [<c01b10fc>] (srcu_invoke_callbacks+0xcc/0x154)
>> [<c01b10fc>] (srcu_invoke_callbacks) from [<c01493c4>] (process_one_work+0x234/0x7dc)
>> [<c01493c4>] (process_one_work) from [<c01499b0>] (worker_thread+0x44/0x51c)
>> [<c01499b0>] (worker_thread) from [<c0150bf4>] (kthread+0x158/0x1a0)
>> [<c0150bf4>] (kthread) from [<c0100114>] (ret_from_fork+0x14/0x20)
>> Exception stack(0xee921fb0 to 0xee921ff8)
>>
>> This was caused by the device link device being released in the context
>> of srcu_invoke_callbacks(). There is no need to wait till the RCU
>> callback to release the device link device. So release the device
>> earlier and revert the RCU callback code to what it was before
>> commit 287905e68dd2 ("driver core: Expose device link details in sysfs")
>>
>> Fixes: 287905e68dd2 ("driver core: Expose device link details in sysfs")
>> Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
>> Reported-by: Guenter Roeck <linux@xxxxxxxxxxxx>
>> Signed-off-by: Saravana Kannan <saravanak@xxxxxxxxxx>
>> ---
>> Marek and Guenter,
>>
>> It haven't had a chance to test this yet. Can one of you please test it
>> and confirm it fixes the issue?
>>
> With this patch applied, the original warning is gone, but I get lots
> of other warnings.
>
> WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
> Device 'regulators:regulator@0:50038000.ethernet' does not have a release() function, it is broken and must be fixed.
>
> WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4
> Device '53f9c000.gpio:50038000.ethernet' does not have a release() function, it is broken and must be fixed.
>
> WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0xa4^M
> Device '50030000.tscadc:50030400.tcq' does not have a release() function, it is broken and must be fixed.

I confirm that I also get such warnings for every platform device in the
system with this patch applied to linux next-20200715:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at drivers/base/core.c:1790 device_release+0x94/0x98
Device '10023c40.power-domain:13620000.sysmmu' does not have a release()
function, it is broken and must be fixed. See
Documentation/core-api/kobject.rst.
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.8.0-rc5-next-20200715-00002-g0f637964c4b0 #1270
Hardware name: Samsung Exynos (Flattened Device Tree)
[<c011184c>] (unwind_backtrace) from [<c010d250>] (show_stack+0x10/0x14)
[<c010d250>] (show_stack) from [<c051b8fc>] (dump_stack+0xbc/0xe8)
[<c051b8fc>] (dump_stack) from [<c0126ed8>] (__warn+0xf0/0x108)
[<c0126ed8>] (__warn) from [<c0126f64>] (warn_slowpath_fmt+0x74/0xb8)
[<c0126f64>] (warn_slowpath_fmt) from [<c064a2a0>]
(device_release+0x94/0x98)
[<c064a2a0>] (device_release) from [<c0522178>] (kobject_put+0x104/0x288)
[<c0522178>] (kobject_put) from [<c064b45c>] (__device_link_del+0x38/0xac)
[<c064b45c>] (__device_link_del) from [<c064c1f0>]
(device_links_driver_bound+0x260/0x26c)
[<c064c1f0>] (device_links_driver_bound) from [<c0650af0>]
(driver_bound+0x5c/0x110)
[<c0650af0>] (driver_bound) from [<c0651038>] (really_probe+0x2d4/0x4fc)
[<c0651038>] (really_probe) from [<c06513c8>]
(driver_probe_device+0x78/0x1fc)
[<c06513c8>] (driver_probe_device) from [<c064ee00>]
(bus_for_each_drv+0x74/0xb8)
[<c064ee00>] (bus_for_each_drv) from [<c0650cc4>]
(__device_attach+0xd4/0x16c)
[<c0650cc4>] (__device_attach) from [<c064fdc4>]
(bus_probe_device+0x88/0x90)
[<c064fdc4>] (bus_probe_device) from [<c064c604>]
(fw_devlink_resume+0xa0/0x134)
[<c064c604>] (fw_devlink_resume) from [<c102bfd4>]
(of_platform_default_populate_init+0xa8/0xc0)
[<c102bfd4>] (of_platform_default_populate_init) from [<c0102378>]
(do_one_initcall+0x8c/0x424)
[<c0102378>] (do_one_initcall) from [<c1001158>]
(kernel_init_freeable+0x190/0x204)
[<c1001158>] (kernel_init_freeable) from [<c0ac05d0>]
(kernel_init+0x8/0x118)
[<c0ac05d0>] (kernel_init) from [<c0100114>] (ret_from_fork+0x14/0x20)
Exception stack(0xef0dffb0 to 0xef0dfff8)
ffa0:ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ 00000000 00000000 00000000
00000000
ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
irq event stamp: 40543
hardirqs last enabled at (40551): [<c019d624>] console_unlock+0x430/0x6cc
hardirqs last disabled at (40568): [<c019d348>] console_unlock+0x154/0x6cc
softirqs last enabled at (40584): [<c010174c>] __do_softirq+0x50c/0x608
softirqs last disabled at (40595): [<c0130218>] irq_exit+0x168/0x16c
---[ end trace 1d4780a89f63483a ]---

> and so on. I don't know if this is caused by this patch or by
> some other patch in -next.

This is caused by patch 287905e68dd2 ("driver core: Expose device link
details in sysfs"). If you revert it, the warning will go away.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland