Re: [RFC PATCH] raid1: reset 'bi_next' before reuse the bio

From: NeilBrown
Date: Tue Apr 04 2017 - 18:18:53 EST


On Tue, Apr 04 2017, Michael Wang wrote:

> During the testing we found the sync read bio can go through
> path:
>
> md_do_sync()
> sync_request()
> generic_make_request()
> blk_queue_bio()
> blk_attempt_plug_merge()
> bio->bi_next CHAINED HERE
>
> ...
>
> raid1d()
> sync_request_write()
> fix_sync_read_error()
> if FailFast && Faulty
> bio->bi_end_io = end_sync_write
> generic_make_request()
> BUG_ON(bio->bi_next)
>
> This need to meet the conditions:
> * bio once merged
> * read disk have FailFast enabled
> * read disk is Faulty
>
> And since the block layer won't reset the 'bi_next' after bio
> is done inside request, we hit the BUG like that.
>
> This patch simply reset the bi_next before we reuse it.
>
> Signed-off-by: Michael Wang <yun.wang@xxxxxxxxxxxxxxxx>
> ---
> drivers/md/raid1.c | 4 +++-
> 1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 7d67235..0554110 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1986,11 +1986,13 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
> /* Don't try recovering from here - just fail it
> * ... unless it is the last working device of course */
> md_error(mddev, rdev);
> - if (test_bit(Faulty, &rdev->flags))
> + if (test_bit(Faulty, &rdev->flags)) {
> /* Don't try to read from here, but make sure
> * put_buf does it's thing
> */
> bio->bi_end_io = end_sync_write;
> + bio->bi_next = NULL;
> + }
> }
>
> while(sectors) {


Ah - I see what is happening now. I was looking at the vanilla 4.4
code, which doesn't have the failfast changes.

I don't think your patch is correct though. We really shouldn't be
re-using that bio, and setting bi_next to NULL just hides the bug. It
doesn't fix it.
As the rdev is now Faulty, it doesn't make sense for
sync_request_write() to submit a write request to it.

Can you confirm that this works please.

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d2d8b8a5bd56..219f1e1f1d1d 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2180,6 +2180,8 @@ static void sync_request_write(struct mddev *mddev, struct r1bio *r1_bio)
(i == r1_bio->read_disk ||
!test_bit(MD_RECOVERY_SYNC, &mddev->recovery))))
continue;
+ if (test_bit(Faulty, &conf->mirrors[i].rdev->flags))
+ continue;

bio_set_op_attrs(wbio, REQ_OP_WRITE, 0);
if (test_bit(FailFast, &conf->mirrors[i].rdev->flags))


Thanks,
NeilBrown

Attachment: signature.asc
Description: PGP signature