Re: [PATCH] blk-mq: set BLK_MQ_S_STOPPED first to avoid unexpected queue work

From: Bart Van Assche
Date: Wed Jun 29 2022 - 14:23:41 EST


On 6/28/22 22:18, Liu Song wrote:
From: Liu Song <liusong@xxxxxxxxxxxxxxxxx>

In "__blk_mq_delay_run_hw_queue", BLK_MQ_S_STOPPED is checked first,
and then queue work, but in "blk_mq_stop_hw_queue", execute cancel
work first and then set BLK_MQ_S_STOPPED, so there is a risk of
queue work after setting BLK_MQ_S_STOPPED, which can be solved by
adjusting the order.

Signed-off-by: Liu Song <liusong@xxxxxxxxxxxxxxxxx>
---
block/blk-mq.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 93d9d60..865915e 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2258,9 +2258,9 @@ bool blk_mq_queue_stopped(struct request_queue *q)
*/
void blk_mq_stop_hw_queue(struct blk_mq_hw_ctx *hctx)
{
- cancel_delayed_work(&hctx->run_work);
-
set_bit(BLK_MQ_S_STOPPED, &hctx->state);
+
+ cancel_delayed_work(&hctx->run_work);
}
EXPORT_SYMBOL(blk_mq_stop_hw_queue);

What made you come up with this patch? Source code reading or something
else? Please mention this in the patch description.

Regarding the above patch, I don't think this patch fixes the existing
race between blk_mq_stop_hw_queue() and __blk_mq_delay_run_hw_queue(),
not even if cancel_delayed_work_sync() would be used.

The comment block above blk_mq_stop_hw_queue() clearly mentions that it
is not guaranteed that this function stops dispatching of requests
immediately. So why bother about fixing the existing race conditions that
do not affect what is guaranteed by blk_mq_stop_hw_queue()?

Thanks,

Bart.