Re: [RFC PATCH] blk-mq: fixup RESTART when queue becomes idle
From: Bart Van Assche
Date: Thu Jan 18 2018 - 12:21:09 EST
On Thu, 2018-01-18 at 12:03 -0500, Mike Snitzer wrote:
> On Thu, Jan 18 2018 at 11:50am -0500,
> Bart Van Assche <bart.vanassche@xxxxxxx> wrote:
> > My comments about the above are as follows:
> > - It can take up to q->rq_timeout jiffies after a .queue_rq()
> > implementation returned BLK_STS_RESOURCE before blk_mq_timeout_work()
> > gets called. However, it can happen that only a few milliseconds after
> > .queue_rq() returned BLK_STS_RESOURCE that the condition that caused
> > it to return BLK_STS_RESOURCE gets cleared. So the above approach can
> > result in long delays during which it will seem like the queue got
> > stuck. Additionally, I think that the block driver should decide how
> > long it takes before a queue is rerun and not the block layer core.
>
> So configure q->rq_timeout to be shorter? Which is configurable though
> blk_mq_tag_set's 'timeout' member. It apparently defaults to 30 * HZ.
>
> That is the problem with timeouts, there is generally no one size fits
> all.
Sorry but I think that would be wrong. The delay after which a queue is rerun
should not be coupled to the request timeout. These two should be independent.
> > - The lockup that I reported only occurs with the dm driver but not any
> > other block driver. So why to modify the block layer core since this
> > can be fixed by modifying the dm driver?
>
> Hard to know it is only DM's blk-mq that is impacted. That is the only
> blk-mq driver that you're testing like this (that is also able to handle
> faults, etc).
That's not correct. I'm also testing the SCSI core, which is one of the most
complicated block drivers.
> > - A much simpler fix and a fix that is known to work exists, namely
> > inserting a blk_mq_delay_run_hw_queue() call in the dm driver.
>
> Because your "much simpler" fix actively hurts performance, as is
> detailed in this header:
> https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.16&id=ec3eaf9a673106f66606896aed6ddd20180b02ec
We are close to the start of the merge window so I think it's better to fall
back to an old approach that is known to work than to keep a new approach
that is known not to work. Additionally, the performance issue you referred
to only affects IOPS and bandwidth more than 1% with the lpfc driver and that
is because the queue depth it supports is much lower than for other SCSI HBAs,
namely 3 instead of 64.
Thanks,
Bart.