Re: [RFC PATCH] blk-mq: Clean up references when freeing rqs

From: John Garry
Date: Wed Dec 02 2020 - 06:20:03 EST


On 02/12/2020 03:31, Ming Lei wrote:
On Tue, Dec 01, 2020 at 09:02:18PM +0800, John Garry wrote:
It has been reported many times that a use-after-free can be intermittently
found when iterating busy requests:

- https://lore.kernel.org/linux-block/8376443a-ec1b-0cef-8244-ed584b96fa96@xxxxxxxxxx/
- https://lore.kernel.org/linux-block/5c3ac5af-ed81-11e4-fee3-f92175f14daf@xxxxxxx/T/#m6c1ac11540522716f645d004e2a5a13c9f218908
- https://lore.kernel.org/linux-block/04e2f9e8-79fa-f1cb-ab23-4a15bf3f64cc@xxxxxxxxx/

The issue is that when we switch scheduler or change queue nr_requests,
the driver tagset may keep references to the stale requests.

As a solution, clean up any references to those requests in the driver
tagset when freeing. This is done with a cmpxchg to make safe any race
with setting the driver tagset request from another queue.

Signed-off-by: John Garry <john.garry@xxxxxxxxxx>
--
Set as RFC as I need to test more. And not sure on solution method, as
Bart had another idea.

diff --git a/block/blk-mq-sched.c b/block/blk-mq-sched.c
index d1eafe2c045c..9b042c7036b3 100644
--- a/block/blk-mq-sched.c
+++ b/block/blk-mq-sched.c
@@ -621,7 +621,7 @@ void blk_mq_sched_free_requests(struct request_queue *q)
queue_for_each_hw_ctx(q, hctx, i) {
if (hctx->sched_tags)
- blk_mq_free_rqs(q->tag_set, hctx->sched_tags, i);
+ blk_mq_free_rqs_ext(q->tag_set, hctx->sched_tags, i, hctx->tags);
}
}
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 9c92053e704d..562db72e7d79 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -576,7 +576,7 @@ int blk_mq_tag_update_depth(struct blk_mq_hw_ctx *hctx,
return -ENOMEM;
}
- blk_mq_free_rqs(set, *tagsptr, hctx->queue_num);
+ blk_mq_free_rqs_ext(set, *tagsptr, hctx->queue_num, hctx->tags);
blk_mq_free_rq_map(*tagsptr, flags);
*tagsptr = new;
} else {
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 55bcee5dc032..f3aad695cd25 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2271,8 +2271,8 @@ blk_qc_t blk_mq_submit_bio(struct bio *bio)
return BLK_QC_T_NONE;
}
-void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
- unsigned int hctx_idx)
+void blk_mq_free_rqs_ext(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
+ unsigned int hctx_idx, struct blk_mq_tags *references)
{
struct page *page;
@@ -2281,10 +2281,13 @@ void blk_mq_free_rqs(struct blk_mq_tag_set *set, struct blk_mq_tags *tags,
for (i = 0; i < tags->nr_tags; i++) {
struct request *rq = tags->static_rqs[i];
+ int j;
if (!rq)
continue;
set->ops->exit_request(set, rq, hctx_idx);
+ for (j = 0; references && j < references->nr_tags; j++)
+ cmpxchg(&references->rqs[j], rq, 0);

Seems you didn't address the comment in the following link:

https://lore.kernel.org/linux-block/10331543-9e45-ae63-8cdb-17e5a2a3b7ef@xxxxxxxxxx/

The request to be freed may still be refered in another path, such as blk_mq_queue_tag_busy_iter
or blk_mq_tagset_busy_iter(), and cmpxchg() doesn't drain/wait for other refers.


Hi Ming,

Yeah, so I said that was another problem which you mentioned there, which I'm not addressing, but I don't think that I'm making thing worse here.

So AFAICS, the blk-mq/sched code doesn't wait for any "readers" to be finished, such as those running blk_mq_queue_tag_busy_iter or blk_mq_tagset_busy_iter() in another context.

So how about the idea of introducing some synchronization primitive, such as semaphore, which those "readers" must grab and release at start and end (of iter), to ensure the requests are not freed during the iteration?

Thanks,
John