Re: Freezable workqueue blocks non-freezable workqueue during the system resume process

From: Peter Chen
Date: Wed Feb 24 2016 - 02:29:00 EST


On Tue, Feb 23, 2016 at 10:34:09AM -0500, Alan Stern wrote:
> On Tue, 23 Feb 2016, Peter Chen wrote:
>
> > Hi Tejun Heo and Florian Mickler,
> >
> > I have a question that during the system resume process, the freezable
> > workqueue can be thawed if there is a non-freezable workqueue is
> > blocked (At uninterruptable state)?
> >
> > My case like below, I have a USB OTG (Micro-AB) cable is at USB
> > Micro-B port, and there is a USB driver on it, and un-plug this
> > cable can wake up system from the suspend. There is a non-freezable
> > workqueue ci_otg will be scheduled after disconnecting OTG cable,
> > and in its worker ci_otg_work, it will try to disconnect USB drive,
> > and flush disk information.
>
> These operations probably are not safe while the system is resuming.
> It might be best to make them wait until the resume is finished.
>
> > But flush disk information is done by
> > freezable workqueue writeback, it seeems workqueue writeback is
> > never got chance to execute, the workqueue ci_otg is waiting there
> > forever, and the system is deadlock.
>
> > Both change workqueue ci_otg as freezable or change workqueue writeback
> > as non-freezable can fix this problem.
>
> It sounds like making ci_otg freezable is the easiest solution.
>
> > Please ignore it, the system is locked at driver's resume,
> > maybe at scsi or usb driver, so of cos, the freezable processes
> > can't be thawed.
>
> > > [ 555.263177] [<c0043b1c>] (flush_work) from [<c0043fac>] (flush_delayed_work+0x48/0x4c)
> > > [ 555.271106] r8:ed5b5000 r7:c0b38a3c r6:eea439cc r5:eea4372c r4:eea4372c
> > > [ 555.277958] [<c0043f64>] (flush_delayed_work) from [<c00eae18>] (bdi_unregister+0x84/0xec)
> > > [ 555.286236] r4:eea43520 r3:20000153
> > > [ 555.289885] [<c00ead94>] (bdi_unregister) from [<c02c2154>] (blk_cleanup_queue+0x180/0x29c)
> > > [ 555.298250] r5:eea43808 r4:eea43400
>
> You might want to complain to the block-layer people about this. I
> don't know if anything can be done to fix it.
>
> Or maybe flush_work and flush_delayed_work can be changed to avoid
> blocking if the workqueue is frozen. Tejun?
>

I have a patch to show the root cause of this issue.

http://www.spinics.net/lists/linux-usb/msg136815.html

--

Best Regards,
Peter Chen