Re: Strange block/scsi/workqueue issue

From: Tejun Heo
Date: Tue Apr 12 2011 - 01:15:24 EST


Hello,

On Mon, Apr 11, 2011 at 11:49:17PM -0500, James Bottomley wrote:
> > But confused here. Why does it make any difference whether the
> > release operation is in the request_fn context or not? What makes
> > SCSI refcounting different from others?
>
> I didn't say it did. SCSI refcounting is fairly standard.
>
> The problem isn't really anything to do with SCSI ... it's the way block
> queue destruction must now be called. The block queue destruction
> includes a synchronous flush of the work queue. That means it can't be
> called from the executing workqueue without deadlocking. The last put
> of a SCSI device destroys the queue. This now means that the last put
> of the SCSI device can't be in the block delay work path. However, as
> the device shuts down that can very well wind up happening if
> blk_delay_queue() ends up being called as the device is dying.

Yeah, I understood that part. I thought you were referring to the
problem caused by the proposed workqueue replacement in the patch when
you talked about workqueue related issues in the previous message, and
saying that there's an inherent problem with using workqueue directly.

> The entangled deadlock seems to have been introduced by commit
> 3cca6dc1c81e2407928dc4c6105252146fd3924f prior to that, there was no
> synchronous cancel in the destroy path.
>
> A fix might be to shunt more stuff off to workqueues, but that's
> producing a more complex system which would be prone to entanglements
> that would be even harder to spot.

I don't agree there. To me, the cause for entanglement seems to be
request_fn calling all the way through blocking destruction because it
detected that the final put was called with sleepable context. It's
just weird and difficult to anticipate to directly call into sleepable
destruction path from request_fn whether it had sleepable context or
not. With the yet-to-be-debugged bug caused by the conversion aside,
I think simply using workqueue is the better solution.

> Perhaps a better solution is just not to use sync cancellations in
> block? As long as the work in the queue holds a queue ref, they can be
> done asynchronously.

Hmmm... maybe but at least I prefer doing explicit shutdown/draining
on destruction even if the base data structure is refcounted. Things
become much more predictable that way.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/