Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery

From: Li Nan
Date: Thu May 25 2023 - 10:00:52 EST

Next message: Dmitry Rokosov: "Re: [PATCH v16 5/6] dt-bindings: clock: meson: add A1 Peripherals clock controller bindings"
Previous message: patchwork-bot+netdevbpf: "Re: [PATCH net-next] s390/ism: Set DMA coherent mask"
Next in thread: Yu Kuai: "Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

在 2023/5/22 21:54, Yu Kuai 写道:

Hi,

在 2023/05/22 19:54, linan666@xxxxxxxxxxxxxxx 写道:

From: Li Nan <linan122@xxxxxxxxxx>

Recovery will go to giveup and let chunks_skipped++ in
raid10_sync_request() if there are some bad_blocks, and it will return
max_sector when chunks_skipped >= geo.raid_disks. Now, recovery fail and
data is inconsistent but user think recovery is done, it is wrong.

Fix it by set mirror's recovery_disabled and spare device shouln't be
added to here.

Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
---
drivers/md/raid10.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index e21502c03b45..70cc87c7ee57 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3303,6 +3303,7 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
      int chunks_skipped = 0;
      sector_t chunk_mask = conf->geo.chunk_mask;
      int page_idx = 0;
+    int error_disk = -1;
      /*
       * Allow skipping a full rebuild for incremental assembly
@@ -3386,7 +3387,18 @@ static sector_t raid10_sync_request(struct mddev *mddev, sector_t sector_nr,
          return reshape_request(mddev, sector_nr, skipped);
      if (chunks_skipped >= conf->geo.raid_disks) {
-        /* if there has been nothing to do on any drive,
+        pr_err("md/raid10:%s: %s fail\n", mdname(mddev),
+            test_bit(MD_RECOVERY_SYNC, &mddev->recovery) ? "resync" : "recovery");

Line exceed 80 columns, and following.

+        if (error_disk >= 0 && !test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {

Resync has the same problem, right?

Yes. But I have no idea to fix it. md_error disk nor set recovery_disabled is a good solution. So, just print error message now.
Do you have any ideas?
--
Thanks,
Nan

Next message: Dmitry Rokosov: "Re: [PATCH v16 5/6] dt-bindings: clock: meson: add A1 Peripherals clock controller bindings"
Previous message: patchwork-bot+netdevbpf: "Re: [PATCH net-next] s390/ism: Set DMA coherent mask"
Next in thread: Yu Kuai: "Re: [PATCH 2/3] md/raid10: fix incorrect done of recovery"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]