Re: [PATCH -next] driver core: fix deadlock in __driver_attach
From: Greg KH
Date: Tue Jun 21 2022 - 15:34:35 EST
A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?
A: No.
Q: Should I include quotations after my reply?
http://daringfireball.net/2007/07/on_top
On Thu, Jun 16, 2022 at 04:00:58PM +0800, zhangwensheng (E) wrote:
> sorry that I didn't see your reply.
> it is real not potential, I have triggered this problem successfully and
> proven that this change can fix it.
>
> stack like commit b232b02bf3c2 ("driver core: fix deadlock in
> __device_attach").
> list below:
> In __driver_attach function, The lock holding logic is as follows:
> ...
> __driver_attach
> if (driver_allows_async_probing(drv))
> device_lock(dev) // get lock dev
> async_schedule_dev(__driver_attach_async_helper, dev); // func
> async_schedule_node
> async_schedule_node_domain(func)
> entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
> /* when fail or work limit, sync to execute func, but
> __driver_attach_async_helper will get lock dev as
> will, which will lead to A-A deadlock. */
> if (!entry || atomic_read(&entry_count) > MAX_WORK) {
> func;
> else
> queue_work_node(node, system_unbound_wq, &entry->work)
> device_unlock(dev)
>
> As above show, when it is allowed to do async probes, because of
> out of memory or work limit, async work is not be allowed, to do
> sync execute instead. it will lead to A-A deadlock because of
> __driver_attach_async_helper getting lock dev.
>
> Because it's logic is same as commit b232b02bf3c2 ("driver core: fix
> deadlock
> in __device_attach"), I simplify the description.
>
>
> Reproduce:
> and it can be reproduce by make the condition
> (if (!entry || atomic_read(&entry_count) > MAX_WORK)) untenable, like below:
>
> [ 370.785650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [ 370.787154] task:swapper/0 state:D stack: 0 pid: 1 ppid:
> 0 flags:0x00004000
> [ 370.788865] Call Trace:
> [ 370.789374] <TASK>
> [ 370.789841] __schedule+0x482/0x1050
> [ 370.790613] schedule+0x92/0x1a0
> [ 370.791290] schedule_preempt_disabled+0x2c/0x50
> [ 370.792256] __mutex_lock.isra.0+0x757/0xec0
> [ 370.793158] __mutex_lock_slowpath+0x1f/0x30
> [ 370.794079] mutex_lock+0x50/0x60
> [ 370.794795] __device_driver_lock+0x2f/0x70
> [ 370.795677] ? driver_probe_device+0xd0/0xd0
> [ 370.796576] __driver_attach_async_helper+0x1d/0xd0
> [ 370.797318] ? driver_probe_device+0xd0/0xd0
> [ 370.797957] async_schedule_node_domain+0xa5/0xc0
> [ 370.798652] async_schedule_node+0x19/0x30
> [ 370.799243] __driver_attach+0x246/0x290
> [ 370.799828] ? driver_allows_async_probing+0xa0/0xa0
> [ 370.800548] bus_for_each_dev+0x9d/0x130
> [ 370.801132] driver_attach+0x22/0x30
> [ 370.801666] bus_add_driver+0x290/0x340
> [ 370.802246] driver_register+0x88/0x140
> [ 370.802817] ? virtio_scsi_init+0x116/0x116
> [ 370.803425] scsi_register_driver+0x1a/0x30
> [ 370.804057] init_sd+0x184/0x226
> [ 370.804533] do_one_initcall+0x71/0x3a0
> [ 370.805107] kernel_init_freeable+0x39a/0x43a
> [ 370.805759] ? rest_init+0x150/0x150
> [ 370.806283] kernel_init+0x26/0x230
> [ 370.806799] ret_from_fork+0x1f/0x30
>
> And my change can fix it.
Ok, please put that type of information in the changelog text.
thanks,
greg k-h