[PATCH v4 2/2] md/raid10: fix incorrect done of recovery

From: linan666
Date: Wed May 31 2023 - 05:30:34 EST


From: Li Nan <linan122@xxxxxxxxxx>

In raid10_sync_request(), if data cannot be read from any disk for
recovery, it will go to 'giveup' and let 'chunks_skipped' + 1. After
multiple 'giveup', when 'chunks_skipped >= geo.raid_disks', it will
return 'max_sector', indicating that the recovery has been completed.
However, the recovery is just aborted and the data remains inconsistent.

Fix it by setting mirror->recovery_disabled, which will prevent the spare
disk from being added to this mirror. The same issue also exists during
resync, it will be fixed afterwards.

Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
---
drivers/md/raid10.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index d93d8cb2b620..3ba1516ea160 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
int chunks_skipped = 0;
sector_t chunk_mask = conf->geo.chunk_mask;
int page_idx = 0;
+ int error_disk = -1;

/*
* Allow skipping a full rebuild for incremental assembly
@@ -3386,7 +3387,20 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
return reshape_request(mddev, sector_nr, skipped);

if (chunks_skipped >= conf->geo.raid_disks) {
- /* if there has been nothing to do on any drive,
+ pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+ test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");
+ if (error_disk >= 0 &&
+ !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
+ /*
+ * recovery fail, set mirrors.recovory_disabled,
+ * device shouldn't be added to there.
+ */
+ conf->mirrors[error_disk].recovery_disabled =
+ mddev->recovery_disabled;
+ return 0;
+ }
+ /*
+ * if there has been nothing to do on any drive,
* then there is nothing to do at all..
*/
*skipped = 1;
@@ -3638,6 +3652,8 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
mdname(mddev));
mirror->recovery_disabled
= mddev->recovery_disabled;
+ } else {
+ error_disk = i;
}
put_buf(r10_bio);
if (rb2)
--
2.31.1