Re: [PATCH] raid10: badblock-aware reshape write error handling

From: caogh

Date: Fri Jun 12 2026 - 04:05:43 EST

Nan, thanks for the review.

You're right that the current code chains several mechanisms (badblock recording, WantReplacement, md_error) together, which hurts readability. I'll refactor in the next version to make the logic clearer.

However, regarding your suggested logic:

The correct logic might be:
1. If there is a replacement, fail directly with md_error
2. If not, mark the badblock and set WantReplacement

I have some questions.

If a replacement device hits a write error, calling md_error directly will mark it Faulty and kick it out of the array. But it may just be a few bad sectors — recording the badblock and continuing the reshape is sufficient. Is it really necessary to call md_error directly in this case?

The current patch does:
1、Try to record the badblock first
2、On success:
a) member device: set WantReplacement to trigger a replacement
b) replacement device: skip WantReplacement to avoid a replacement loop
3、On failure: rdev_set_badblocks has already called md_error internally, no further action needed

Does this approach seem reasonable?

在 2026/6/12 11:42, Li Nan 写道:

On Mon Jun 1, 2026 at 1:46 PM CST, ghuicao wrote:

From: Cao Guanghui<caoguanghui@xxxxxxxxxx>

Replace the FIXME in end_reshape_write(). Instead of failing the device
immediately on write errors during reshape, attempt to record badblocks
using new_data_offset with is_new=1.

rdev_set_badblocks() returns true on success. On failure (e.g., badblocks
table full), it has already called md_error() internally to degrade the
device. Queue WantReplacement for member devices regardless of badblock
recording success, but skip this for replacement devices to avoid
replacement loops.

On successful write, clear stale badblock records at the new location
since data has migrated.

Signed-off-by: Cao Guanghui<caoguanghui@xxxxxxxxxx>
---
drivers/md/raid10.c | 27 ++++++++++++++++++++++++---
1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 4901ebe45c87..08d58a1c680e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4991,9 +4991,30 @@ static void end_reshape_write(struct bio *bio)
conf->mirrors[d].rdev;
if (bio->bi_status) {
- /* FIXME should record badblock */
- md_error(mddev, rdev);
- }
+ set_bit(WriteErrorSeen, &rdev->flags);
+
+ /* rdev_set_badblocks returns true on success.
+ * On failure, it has already called md_error() internally.
+ * Use is_new=1 as reshape writes target the new layout
+ * (new_data_offset).
+ */
+ if (rdev_set_badblocks(rdev, r10_bio->devs[slot].addr,
+ r10_bio->sectors, 1)) {
+ /* Queue async replacement for member devices
+ * For replacement devices, do not trigger WantReplacement
+ * to avoid circular replacement storms.
+ */
+ if (!repl) {
+ if (!test_and_set_bit(WantReplacement, &rdev->flags))
+ set_bit(MD_RECOVERY_NEEDED,
+ &rdev->mddev->recovery);

The logic here seems a bit odd — several mechanisms are chained together.

The correct logic might be:
1. If there is a replacement, fail directly with md_error
2. If not, mark the badblock and set WantReplacement

+ }
+ }
+ } else {
+ /* Write succeeded, clear stale badblock records */
+ rdev_clear_badblocks(rdev, r10_bio->devs[slot].addr,
+ r10_bio->sectors, 1);
+ }
rdev_dec_pending(rdev, mddev);
end_reshape_request(r10_bio);

Thanks, Cao Guanghui