Re: bio linked list corruption.

From: Jens Axboe
Date: Wed Oct 26 2016 - 18:54:21 EST


On 10/26/2016 04:40 PM, Dave Jones wrote:
On Wed, Oct 26, 2016 at 03:21:53PM -0700, Linus Torvalds wrote:

> Could you try the attached patch? It adds a couple of sanity tests:
>
> - a number of tests to verify that 'rq->queuelist' isn't already on
> some queue when it is added to a queue
>
> - one test to verify that rq->mq_ctx is the same ctx that we have locked.
>
> I may be completely full of shit, and this patch may be pure garbage
> or "obviously will never trigger", but humor me.

I gave it a shot too for shits & giggles.
This falls out during boot.

[ 9.244030] EXT4-fs (sda4): mounted filesystem with ordered data mode. Opts: (null)
[ 9.271391] ------------[ cut here ]------------
[ 9.278420] WARNING: CPU: 0 PID: 1 at block/blk-mq.c:1181 blk_sq_make_request+0x465/0x4a0
[ 9.285613] CPU: 0 PID: 1 Comm: init Not tainted 4.9.0-rc2-think+ #4

Very odd, don't immediately see how that can happen. For testing, can
you try and add the below patch? Just curious if that fixes the list
corruption. Thing is, I don't see how ->mq_ctx and ctx are different in
this path, but I can debug that on the side.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index ddc2eed64771..73b9462aa21f 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1165,9 +1165,10 @@ static inline bool hctx_allow_merges(struct blk_mq_hw_ctx *hctx)
}

static inline bool blk_mq_merge_queue_io(struct blk_mq_hw_ctx *hctx,
- struct blk_mq_ctx *ctx,
struct request *rq, struct bio *bio)
{
+ struct blk_mq_ctx *ctx = rq->mq_ctx;
+
if (!hctx_allow_merges(hctx) || !bio_mergeable(bio)) {
blk_mq_bio_to_request(rq, bio);
spin_lock(&ctx->lock);
@@ -1338,7 +1339,7 @@ static blk_qc_t blk_mq_make_request(struct request_queue *q, struct bio *bio)
goto done;
}

- if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
+ if (!blk_mq_merge_queue_io(data.hctx, rq, bio)) {
/*
* For a SYNC request, send it to the hardware immediately. For
* an ASYNC request, just ensure that we run it later on. The
@@ -1416,7 +1417,7 @@ static blk_qc_t blk_sq_make_request(struct request_queue *q, struct bio *bio)
return cookie;
}

- if (!blk_mq_merge_queue_io(data.hctx, data.ctx, rq, bio)) {
+ if (!blk_mq_merge_queue_io(data.hctx, rq, bio)) {
/*
* For a SYNC request, send it to the hardware immediately. For
* an ASYNC request, just ensure that we run it later on. The

--
Jens Axboe