Re: [PATCH v4 4/4] blk-flush: reuse rq queuelist in flush state machine

From: Friedrich Weber
Date: Tue May 28 2024 - 10:41:12 EST


On 28/05/2024 11:09, Chengming Zhou wrote:
> On 2024/5/28 16:42, Friedrich Weber wrote:
>> Hope I did this correctly. With this, the reproducer triggered a BUG
>> pretty quickly, see [0]. If I can provide anything else, just let me know.
>
> Thanks for your patience, it's correct. Then how about this fix?
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index d98654869615..b2ec5c4c738f 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -485,6 +485,7 @@ static struct request *__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
> if (data->nr_tags > 1) {
> rq = __blk_mq_alloc_requests_batch(data);
> if (rq) {
> + INIT_LIST_HEAD(&rq->queuelist);
> blk_mq_rq_time_init(rq, alloc_time_ns);
> return rq;
> }
>

Nice, seems like with this patch applied on top of 6.9, the reproducer
does not trigger crashes anymore for me! Thanks!

To verify that the reproducer hits the new INIT_LIST_HEAD, I added debug
prints before/after:

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 4da581f13273..75186bb0d9c9 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -485,7 +485,9 @@ static struct request
*__blk_mq_alloc_requests(struct blk_mq_alloc_data *data)
if (data->nr_tags > 1) {
rq = __blk_mq_alloc_requests_batch(data);
if (rq) {
+ pr_warn("before init: list: %p next: %p prev:
%p\n", &rq->queuelist, rq->queuelist.next, rq->queuelist.prev);
INIT_LIST_HEAD(&rq->queuelist);
+ pr_warn("after init: list: %p next: %p prev:
%p\n", &rq->queuelist, rq->queuelist.next, rq->queuelist.prev);
blk_mq_rq_time_init(rq, alloc_time_ns);
return rq;
}

And indeed, I see quite some printouts where rq->queuelist.next differs
before/after the request, e.g.

May 28 16:31:25 reproflushfull kernel: before init: list:
000000001e0a144f next: 00000000aaa2e372 prev: 000000001e0a144f
May 28 16:31:25 reproflushfull kernel: after init: list:
000000001e0a144f next: 000000001e0a144f prev: 000000001e0a144f
May 28 16:31:26 reproflushfull kernel: before init: list:
000000001e0a144f next: 00000000aaa2e372 prev: 000000001e0a144f
May 28 16:31:26 reproflushfull kernel: after init: list:
000000001e0a144f next: 000000001e0a144f prev: 000000001e0a144f

I know very little about the block layer, but if I understand the commit
message of the original 81ada09cc25e correctly, it's expected that
queuelist needs to be reinitialized at some places. I'm just a little
confused to see the same pointer 00000000aaa2e372 in two subsequent
"before init" printouts for the same queuelist 000000001e0a144f. Is this
expected too?

Also, just out of interest: Can you estimate whether this issue is
specific to software RAID setups, or could similar NULL pointer
dereferences also happen in setups without software RAID?

Best wishes,

Friedrich