[RFC PATCH] blk-mq: fix potential I/O hang caused by batch wakeup

From: Yang Yang
Date: Sun May 19 2024 - 23:39:32 EST


The depth is 62, and the wake_batch is 8. In the following situation,
the task would hang forever.

t1: t2: t3:
blk_mq_get_tag . .
io_schedule . .
elevator_switch .
blk_mq_freeze_queue .
blk_freeze_queue_start .
blk_mq_freeze_queue_wait .
blk_mq_submit_bio
__bio_queue_enter

Fix this issue by waking up all the waiters sleeping on tags after
freezing the queue.

Signed-off-by: Yang Yang <yang.yang@xxxxxxxx>
---
block/blk-core.c | 2 --
block/blk-mq.c | 4 +++-
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index a16b5abdbbf5..e1eacfad6e5b 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -298,8 +298,6 @@ void blk_queue_start_drain(struct request_queue *q)
* prevent I/O from crossing blk_queue_enter().
*/
blk_freeze_queue_start(q);
- if (queue_is_mq(q))
- blk_mq_wake_waiters(q);
/* Make blk_queue_enter() reexamine the DYING flag. */
wake_up_all(&q->mq_freeze_wq);
}
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4ecb9db62337..9eb3139e713a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -125,8 +125,10 @@ void blk_freeze_queue_start(struct request_queue *q)
if (++q->mq_freeze_depth == 1) {
percpu_ref_kill(&q->q_usage_counter);
mutex_unlock(&q->mq_freeze_lock);
- if (queue_is_mq(q))
+ if (queue_is_mq(q)) {
+ blk_mq_wake_waiters(q);
blk_mq_run_hw_queues(q, false);
+ }
} else {
mutex_unlock(&q->mq_freeze_lock);
}
--
2.34.1