Re: [RFC] deadlock with flush_work() in UAS

From: Alan Stern
Date: Tue Jun 18 2019 - 12:04:43 EST


Tejun and other workqueue maintainers:

On Tue, 18 Jun 2019, Oliver Neukum wrote:

> Am Dienstag, den 18.06.2019, 11:29 -0400 schrieb Alan Stern:
> > On Tue, 18 Jun 2019, Oliver Neukum wrote:
> >
> > > Hi,
> > >
> > > looking at those deadlocks it looks to me like UAS can
> > > deadlock on itself. What do you think?
> > >
> > > Regards
> > > Oliver
> > >
> > > From 2d497f662e6c03fe9e0a75e05b64d52514e890b3 Mon Sep 17 00:00:00 2001
> > > From: Oliver Neukum <oneukum@xxxxxxxx>
> > > Date: Tue, 18 Jun 2019 15:03:56 +0200
> > > Subject: [PATCH] UAS: fix deadlock in error handling and PM flushing work
> > >
> > > A SCSI error handler and block runtime PM must not allocate
> > > memory with GFP_KERNEL. Furthermore they must not wait for
> > > tasks allocating memory with GFP_KERNEL.
> > > That means that they cannot share a workqueue with arbitrary tasks.
> > >
> > > Fix this for UAS using a private workqueue.
> >
> > I'm not so sure that one long-running task in a workqueue will block
> > other tasks. Workqueues have variable numbers of threads, added and
> > removed on demand. (On the other hand, when new threads need to be
> > added the workqueue manager probably uses GFP_KERNEL.)
>
> Do we have a guarantee it will reschedule already scheduled works?
> The deadlock would be something like
>
> uas_pre_reset() -> uas_wait_for_pending_cmnds() ->
> flush_work(&devinfo->work) -> kmalloc() -> DEADLOCK
>
> You can also make this chain with uas_suspend()
>
> > Even if you disagree, perhaps we should have a global workqueue with a
> > permanently set noio flag. It could be shared among multiple drivers
> > such as uas and the hub driver for purposes like this. (In fact, the
> > hub driver already has its own dedicated workqueue.)
>
> That is a good idea. But does UAS need WQ_MEM_RECLAIM?

These are good questions, and I don't have the answers. Perhaps Tejun
or someone else on LKML can help.

Alan Stern