Re: [PATCH] scsi-mq: fix hw queue hang caused by timeout

From: Jens Axboe
Date: Thu Sep 18 2014 - 13:04:07 EST


On 2014-09-18 10:35, Christoph Hellwig wrote:
On Thu, Sep 18, 2014 at 11:59:10PM +0800, Ming Lei wrote:
If there are two requests or more timed out, the dispatch queue
is put into stopped state and never be recoverd, and there
is no such problem in non-mq mode.

This patch trys to recover the stopped queue when the queue
becomes unbusy, then the following retries can move on.

Basically this patch maintains same behavior for this situation
with non-mq mode.

This looks somewhat similar to the issues that Doug reported, and I remember
when he was last running into boot problems it was timeout related, too.

As far as the implementation is concerned I think the correct fix is
to clear the BLK_MQ_S_STOPPED queue flags in blk_mq_kick_requeue_list.

Since that's the kick part of the requeue, auto-starting the queue for that makes a lot of sense. I say that's the way we go.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/