On Mon, May 04, 2015 at 01:56:42PM -0600, Jens Axboe wrote:
On 05/04/2015 01:51 PM, Shaohua Li wrote:
On Mon, May 04, 2015 at 01:17:19PM -0600, Jens Axboe wrote:
On 05/02/2015 06:31 PM, Shaohua Li wrote:
Normally if driver is busy to dispatch a request the logic is like below:
block layer: driver:
__blk_mq_run_hw_queue
a. blk_mq_stop_hw_queue
b. rq add to ctx->dispatch
later:
1. blk_mq_start_hw_queue
2. __blk_mq_run_hw_queue
But it's possible step 1-2 runs between a and b. And since rq isn't in
ctx->dispatch yet, step 2 will not run rq. The rq might get lost if
there are no subsequent requests kick in.
Good catch! But the patch introduces a potentially never ending loop
in __blk_mq_run_hw_queue(). Not sure how we can fully close it, but
it might be better to punt the re-run after adding the requests back
to the worker. That would turn a potential busy loop (until requests
complete) into something with nicer behavior, at least. Ala
if (!test_bit(BLK_MQ_S_STOPPED, &hctx->state))
kblockd_schedule_delayed_work_on(blk_mq_hctx_next_cpu(hctx),
&hctx->run_work, 0);
My first version of the patch is like this, but I changed my mind later.
The assumption is driver will stop queue if it's busy to dispatch
request. If the driver is buggy, we will have the endless loop here.
Should we assume drivers will not do the right thing?
There's really no contract that says the driver MUST stop the queue
for busy. It could, legitimately, decide to just always run the
queue when requests complete.
It might be better to simply force this behavior. If we get a BUSY,
stop the queue from __blk_mq_run_hw_queue(). And if the bit isn't
still set on re-add, then we know we need to re-run it. I think that
would be a cleaner API, less fragile, and harder to get wrong. The
down side is that now this stop happens implicitly by the core, and
the driver must now have an asymmetric queue start when it frees the
limited resource that caused the BUSY return. Either that, or we
define a 2nd set of start/stop bits, one used exclusively by the
driver and one used exclusively by blk-mq. Then blk-mq could restart
the queue on completion of a request, since it would then know that
blk-mq was the one that stopped it.
Agree. I'll make the rerun async for now and leave above as a future
improvement.