Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast

Next message: Daniel Golle: "Re: [PATCH RFC net-next v3 2/4] net: dsa: add tag formats for MxL862xx switches"
Previous message: Baolin Liu: "[PATCH v1] erofs: Fix state inconsistency when updating fsid/domain_id"
In reply to: Kenta Akagi: "[PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast"
Next in thread: Xiao Ni: "Re: [PATCH v6 1/2] md: Don't set MD_BROKEN for RAID1 and RAID10 when using FailFast"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Li Nan

Date: Mon Jan 05 2026 - 21:57:59 EST

在 2026/1/5 22:40, Kenta Akagi 写道:

After commit 9631abdbf406 ("md: Set MD_BROKEN for RAID1 and RAID10"),
if the error handler is called on the last rdev in RAID1 or RAID10,
the MD_BROKEN flag will be set on that mddev.
When MD_BROKEN is set, write bios to the md will result in an I/O error.

This causes a problem when using FailFast.
The current implementation of FailFast expects the array to continue
functioning without issues even after calling md_error for the last
rdev. Furthermore, due to the nature of its functionality, FailFast may
call md_error on all rdevs of the md. Even if retrying I/O on an rdev
would succeed, it first calls md_error before retrying.

To fix this issue, this commit ensures that for RAID1 and RAID10, if the
last In_sync rdev has the FailFast flag set and the mddev's fail_last_dev
is off, the MD_BROKEN flag will not be set on that mddev.

This change impacts userspace. After this commit, If the rdev has the
FailFast flag, the mddev never broken even if the failing bio is not
FailFast. However, it's unlikely that any setup using FailFast expects
the array to halt when md_error is called on the last rdev.

In the current RAID design, when an IO error occurs, RAID ensures faulty
data is not read via the following actions:
1. Mark the badblocks (no FailFast flag); if this fails,
2. Mark the disk as Faulty.

If neither action is taken, and BROKEN is not set to prevent continued RAID
use, errors on the last remaining disk will be ignored. Subsequent reads
may return incorrect data. This seems like a more serious issue in my opinion.

In scenarios with a large number of transient IO errors, is FailFast not a
suitable configuration? As you mentioned: "retrying I/O on an rdev would
succeed".

--
Thanks,
Nan