Re: [PATCH] SCSI: run queue if SCSI device queue isn't ready and queue is idle

From: Ming Lei
Date: Wed Dec 06 2017 - 20:40:26 EST


On Thu, Dec 07, 2017 at 12:10:51AM +0100, Holger Hoffstätte wrote:
> On 12/05/17 08:52, Ming Lei wrote:
> > Before commit 0df21c86bdbf ("scsi: implement .get_budget and .put_budget
> > for blk-mq"), we run queue after 3ms if queue is idle and SCSI device
> > queue isn't ready, which is done in handling BLK_STS_RESOURCE. After
> > commit 0df21c86bdbf is introduced, queue won't be run any more under
> > this situation.
> >
> > IO hang is observed when timeout happened, and this patch fixes the IO
> > hang issue by running queue after delay in scsi_dev_queue_ready, just like
> > non-mq. This issue can be triggered by the following script[1].
> >
> > There is another issue which can be covered by running idle queue:
> > when .get_budget() is called on request coming from hctx->dispatch_list,
> > if one request just completes during .get_budget(), we can't depend on
> > SCSI's restart to make progress any more. This patch fixes the race too.
> >
> > With this patch, we basically recover to previous behaviour(before commit
> > 0df21c86bdbf) of handling idle queue when running out of resource.
> >
> > [1] script for test/verify SCSI timeout
> > rmmod scsi_debug
> > modprobe scsi_debug max_queue=1
> >
> > DEVICE=`ls -d /sys/bus/pseudo/drivers/scsi_debug/adapter*/host*/target*/*/block/* | head -1 | xargs basename`
> > DISK_DIR=`ls -d /sys/block/$DEVICE/device/scsi_disk/*`
> >
> > echo "using scsi device $DEVICE"
> > echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
> > echo "temporary write through" >$DISK_DIR/cache_type
> > echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
> > echo none > /sys/block/$DEVICE/queue/scheduler
> > dd if=/dev/$DEVICE of=/dev/null bs=1M iflag=direct count=1 &
> > sleep 5
> > echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
> > wait
> > echo "SUCCESS"
> >
> > Fixes: 0df21c86bdbf ("scsi: implement .get_budget and .put_budget for blk-mq")
> > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> > ---
> > drivers/scsi/scsi_lib.c | 2 ++
> > 1 file changed, 2 insertions(+)
> >
> > diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> > index db9556662e27..1816dd8259b3 100644
> > --- a/drivers/scsi/scsi_lib.c
> > +++ b/drivers/scsi/scsi_lib.c
> > @@ -1967,6 +1967,8 @@ static bool scsi_mq_get_budget(struct blk_mq_hw_ctx *hctx)
> > out_put_device:
> > put_device(&sdev->sdev_gendev);
> > out:
> > + if (atomic_read(&sdev->device_busy) == 0 && !scsi_device_blocked(sdev))
> > + blk_mq_delay_run_hw_queue(hctx, SCSI_QUEUE_DELAY);
> > return false;
> > }
>
> So just to follow up on this: with this patch I haven't encountered any
> new hangs with blk-mq, regardless of medium (SSD/rotating disk) or scheduler.
> I cannot speak for other hangs that may be reproducible by other means,
> but for now here's my:
>
> Tested-by: Holger Hoffstätte <holger@xxxxxxxxxxxxxxxxxxxxxx>

Hi Holger,

That is great to see this patch fixes your issue, and thanks for your
test!

Jens, Martin, would any of you mind making this patch in V4.15? Since
it fixes real use cases and this way is exact what we do before
0df21c86bdbf("scsi: implement .get_budget and .put_budget for blk-mq").


Thanks,
Ming