Re: [PATCH v2 06/11] md: remove MD_RECOVERY_ERROR handling and simplify resync_offset update

From: Li Nan
Date: Mon Nov 10 2025 - 07:18:37 EST




在 2025/11/8 18:22, Yu Kuai 写道:
Hi,

在 2025/11/6 19:59, linan666@xxxxxxxxxxxxxxx 写道:
From: Li Nan <linan122@xxxxxxxxxx>

When sync IO failed and setting badblock also failed, unsynced disk
might be kicked via setting 'recovery_disable' without Faulty flag.
MD_RECOVERY_ERROR was set in md_sync_error() to prevent updating
'resync_offset', avoiding reading the failed sync sectors.

Previous patch ensures disk is marked Faulty when badblock setting fails.
Remove MD_RECOVERY_ERROR handling as it's no longer needed - failed sync
sectors are unreadable either via badblock or Faulty disk.

Simplify resync_offset update logic.

Signed-off-by: Li Nan <linan122@xxxxxxxxxx>
---
drivers/md/md.h | 2 --
drivers/md/md.c | 23 +++++------------------
2 files changed, 5 insertions(+), 20 deletions(-)

diff --git a/drivers/md/md.h b/drivers/md/md.h
index 18621dba09a9..c5b5377e9049 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -644,8 +644,6 @@ enum recovery_flags {
MD_RECOVERY_FROZEN,
/* waiting for pers->start() to finish */
MD_RECOVERY_WAIT,
- /* interrupted because io-error */
- MD_RECOVERY_ERROR,
/* flags determines sync action, see details in enum sync_action */
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2bdbb5b0e9e1..71988d8f5154 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8949,7 +8949,6 @@ void md_sync_error(struct mddev *mddev)
{
// stop recovery, signal do_sync ....
set_bit(MD_RECOVERY_INTR, &mddev->recovery);
- set_bit(MD_RECOVERY_ERROR, &mddev->recovery);
md_wakeup_thread(mddev->thread);
}
EXPORT_SYMBOL(md_sync_error);
@@ -9603,8 +9602,8 @@ void md_do_sync(struct md_thread *thread)
wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
if (!test_bit(MD_RECOVERY_RESHAPE, &mddev->recovery) &&
- !test_bit(MD_RECOVERY_INTR, &mddev->recovery) &&

Why the above checking is removed?

Thanks,
Kuai


Before patch 05, a error sync IO might end and decrement recovery_active,
but its error handling is not completed. It sets recovery_disabled and
MD_RECOVERY_INTR, then remove the error disk later. If
'curr_resync_completed' is updated before the disk is removed, it may cause
reading from the sync-failed regions.

After patch 05, the error IO will definitely be handled. After waiting for
'recovery_active' to become 0 in the previous line, all sync IO has
completed regardless of whether MD_RECOVERY_INTR is set. Thus, this check
can be removed.

So I added the following comment:

mddev->curr_resync >= MD_RESYNC_ACTIVE) {
+ /* All sync IO completes after recovery_active becomes 0 */
mddev->curr_resync_completed = mddev->curr_resync;

Since the logic behind this change is complex, should I separate it into a
new commit?

--
Thanks,
Nan