Re: [PATCH] char: misc: make misc_open() and misc_register() killable
From: Greg KH
Date: Wed Jul 06 2022 - 02:34:54 EST
On Wed, Jul 06, 2022 at 03:21:15PM +0900, Tetsuo Handa wrote:
> On 2022/07/05 23:35, Tetsuo Handa wrote:
> > On 2022/07/05 23:16, Greg KH wrote:
> >> Some device is being probed at the moment, maybe we have a deadlock
> >> somewhere here...
> >
> > Lockdep says __device_attach() from hub_event() was in progress.
> >
> > ----------------------------------------
> > [ 237.376478][ T28] 5 locks held by kworker/1:1/26:
> > [ 237.381526][ T28] #0: ffff888016b92538 ((wq_completion)usb_hub_wq){+.+.}-{0:0}, at: process_one_work+0x87a/0x1610
> > [ 237.392798][ T28] #1: ffffc90000c2fda8 ((work_completion)(&hub->events)){+.+.}-{0:0}, at: process_one_work+0x8ae/0x1610
> > [ 237.406354][ T28] #2: ffff88801f7ee220 (&dev->mutex){....}-{3:3}, at: hub_event+0x1c1/0x4680
> > [ 237.415920][ T28] #3: ffff88801b6c6220 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> > [ 237.426682][ T28] #4: ffff8880216bc1a8 (&dev->mutex){....}-{3:3}, at: __device_attach+0x7a/0x4a0
> > ----------------------------------------
> >
>
> The number returned by atomic_read(&probe_count) matched the number of works for hub_event()
> in usb_hub_wq workqueue. The probe function is called from hub_event(), and
> usb_stor_msg_common() calls wait_for_completion_interruptible_timeout(MAX_SCHEDULE_TIMEOUT)
> via driver's init function.
>
> But if the usb device is unresponsive, wait_for_completion_interruptible_timeout() sleeps
> forever. And in this testcase (which emulates usb devices using /dev/raw-gadget interface),
> the usb device became unresponsive because the process who is responsible with reading/writing
> /dev/raw-gadget interface is blocked at mutex_lock(&misc_mtx) at misc_open(), and results in
> an AB-BA deadlock condition. Making misc_open() killable solved this problem, by allowing
> the opener of /dev/raw-gadget interface to call fput() upon "send SIGKILL after 5 seconds from
> fork()" behavior.
>
> Anyway,
>
> /*
> * Resuming. We may need to wait for the image device to
> * appear.
> */
> wait_for_device_probe();
>
> in snapshot_open() will sleep forever if some device became unresponsive.
>
> How should we fix this problem?
We can decrease the timeout in usb_stor_msg_common(). I imagine that if
that timeout is ever hit in this sequence, then all will recover, right?
Try decreasing it to a sane number and see what happens.
thanks,
greg k-h