Re: [PATCH] raid10: badblock-aware reshape write error handling

From: Li Nan

Date: Thu Jun 11 2026 - 23:42:35 EST

On Mon Jun 1, 2026 at 1:46 PM CST, ghuicao wrote:
> From: Cao Guanghui <caoguanghui@xxxxxxxxxx>
>
> Replace the FIXME in end_reshape_write(). Instead of failing the device
> immediately on write errors during reshape, attempt to record badblocks
> using new_data_offset with is_new=1.
>
> rdev_set_badblocks() returns true on success. On failure (e.g., badblocks
> table full), it has already called md_error() internally to degrade the
> device. Queue WantReplacement for member devices regardless of badblock
> recording success, but skip this for replacement devices to avoid
> replacement loops.
>
> On successful write, clear stale badblock records at the new location
> since data has migrated.
>
> Signed-off-by: Cao Guanghui <caoguanghui@xxxxxxxxxx>
> ---
> drivers/md/raid10.c | 27 ++++++++++++++++++++++++---
> 1 file changed, 24 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 4901ebe45c87..08d58a1c680e 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -4991,9 +4991,30 @@ static void end_reshape_write(struct bio *bio)
> conf->mirrors[d].rdev;
>
> if (bio->bi_status) {
> - /* FIXME should record badblock */
> - md_error(mddev, rdev);
> - }
> + set_bit(WriteErrorSeen, &rdev->flags);
> +
> + /* rdev_set_badblocks returns true on success.
> + * On failure, it has already called md_error() internally.
> + * Use is_new=1 as reshape writes target the new layout
> + * (new_data_offset).
> + */
> + if (rdev_set_badblocks(rdev, r10_bio->devs[slot].addr,
> + r10_bio->sectors, 1)) {
> + /* Queue async replacement for member devices
> + * For replacement devices, do not trigger WantReplacement
> + * to avoid circular replacement storms.
> + */
> + if (!repl) {
> + if (!test_and_set_bit(WantReplacement, &rdev->flags))
> + set_bit(MD_RECOVERY_NEEDED,
> + &rdev->mddev->recovery);

The logic here seems a bit odd — several mechanisms are chained together.

The correct logic might be:
1. If there is a replacement, fail directly with md_error
2. If not, mark the badblock and set WantReplacement

> + }
> + }
> + } else {
> + /* Write succeeded, clear stale badblock records */
> + rdev_clear_badblocks(rdev, r10_bio->devs[slot].addr,
> + r10_bio->sectors, 1);
> + }
>
> rdev_dec_pending(rdev, mddev);
> end_reshape_request(r10_bio);

--
Thanks,
Nan