Re: [PATCH -next] driver core: fix deadlock in __driver_attach

From: Greg KH
Date: Tue Jun 21 2022 - 15:34:35 EST



A: http://en.wikipedia.org/wiki/Top_post
Q: Were do I find info about this thing called top-posting?
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing in e-mail?

A: No.
Q: Should I include quotations after my reply?


http://daringfireball.net/2007/07/on_top

On Thu, Jun 16, 2022 at 04:00:58PM +0800, zhangwensheng (E) wrote:
> sorry that I didn't see your reply.
> it is real not potential, I have triggered this problem successfully and
> proven that this change can fix it.
>
> stack like commit b232b02bf3c2 ("driver core: fix deadlock in
> __device_attach").
> list below:
>     In __driver_attach function, The lock holding logic is as follows:
>     ...
>     __driver_attach
>     if (driver_allows_async_probing(drv))
>       device_lock(dev)      // get lock dev
>         async_schedule_dev(__driver_attach_async_helper, dev); // func
>           async_schedule_node
>             async_schedule_node_domain(func)
>               entry = kzalloc(sizeof(struct async_entry), GFP_ATOMIC);
>               /* when fail or work limit, sync to execute func, but
>                  __driver_attach_async_helper will get lock dev as
>                  will, which will lead to A-A deadlock.  */
>               if (!entry || atomic_read(&entry_count) > MAX_WORK) {
>                 func;
>               else
>                 queue_work_node(node, system_unbound_wq, &entry->work)
>       device_unlock(dev)
>
>     As above show, when it is allowed to do async probes, because of
>     out of memory or work limit, async work is not be allowed, to do
>     sync execute instead. it will lead to A-A deadlock because of
>     __driver_attach_async_helper getting lock dev.
>
>     Because it's logic is same as commit b232b02bf3c2 ("driver core: fix
> deadlock
>     in __device_attach"),  I simplify the description.
>
>
> Reproduce:
> and it can be reproduce by make the condition
> (if (!entry || atomic_read(&entry_count) > MAX_WORK)) untenable, like below:
>
> [  370.785650] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  370.787154] task:swapper/0       state:D stack:    0 pid:    1 ppid:    
> 0 flags:0x00004000
> [  370.788865] Call Trace:
> [  370.789374]  <TASK>
> [  370.789841]  __schedule+0x482/0x1050
> [  370.790613]  schedule+0x92/0x1a0
> [  370.791290]  schedule_preempt_disabled+0x2c/0x50
> [  370.792256]  __mutex_lock.isra.0+0x757/0xec0
> [  370.793158]  __mutex_lock_slowpath+0x1f/0x30
> [  370.794079]  mutex_lock+0x50/0x60
> [  370.794795]  __device_driver_lock+0x2f/0x70
> [  370.795677]  ? driver_probe_device+0xd0/0xd0
> [  370.796576]  __driver_attach_async_helper+0x1d/0xd0
> [  370.797318]  ? driver_probe_device+0xd0/0xd0
> [  370.797957]  async_schedule_node_domain+0xa5/0xc0
> [  370.798652]  async_schedule_node+0x19/0x30
> [  370.799243]  __driver_attach+0x246/0x290
> [  370.799828]  ? driver_allows_async_probing+0xa0/0xa0
> [  370.800548]  bus_for_each_dev+0x9d/0x130
> [  370.801132]  driver_attach+0x22/0x30
> [  370.801666]  bus_add_driver+0x290/0x340
> [  370.802246]  driver_register+0x88/0x140
> [  370.802817]  ? virtio_scsi_init+0x116/0x116
> [  370.803425]  scsi_register_driver+0x1a/0x30
> [  370.804057]  init_sd+0x184/0x226
> [  370.804533]  do_one_initcall+0x71/0x3a0
> [  370.805107]  kernel_init_freeable+0x39a/0x43a
> [  370.805759]  ? rest_init+0x150/0x150
> [  370.806283]  kernel_init+0x26/0x230
> [  370.806799]  ret_from_fork+0x1f/0x30
>
> And my change can fix it.

Ok, please put that type of information in the changelog text.

thanks,

greg k-h