[PATCH] md: avoid modifying spares while the array is not suspended
From: Abd-Alrhman Masalkhi
Date: Tue Jun 30 2026 - 04:05:30 EST
remove_spares() and remove_and_add_spares() modify the array's rdev
configuration. These operations are only safe after the array has been
suspended.
Today, md_start_sync() can call md_choose_sync_action() even when the
array has not been suspended. As a result, md_choose_sync_action() may
remove or replace rdevs while normal I/O is still accessing them.
The race can occur as follows:
raid10d Worker Normal IO
____________ _______________________ ______________________
raid10_write_request()
wait_blocked_dev()
set Blocked
set Faulty
Skip Faulty rdev
rrdev->nr_pending++
.repl_bio = bio
removeable_rdev = false .
array not suspended .
lock mddev goto err_handle
lock mddev (wait)
.
update sb .
clear Blocked .
.
unlock mddev .
lock mddev (acquires)
remove_spares()
removeable_rdev = true
raid10_remove_disk()
rdev = replacement
replacement = NULL
rdev_dec_pending(NULL)
unlock mddev (NULL)->nr_pending--
In this case, rdev_dec_pending() is called with a NULL pointer,
resulting in a NULL pointer dereference when attempting to decrement
nr_pending.
Fix this by ensuring that remove_spares() and remove_and_add_spares()
are only called after the array has been suspended, preventing
concurrent rdev configuration changes while normal I/O is in progress.
Fixes: bc08041b32ab ("md: suspend array in md_start_sync() if array need reconfiguration")
Reported-by: sashiko-bot <sashiko-bot@xxxxxxxxxx>
Closes: https://sashiko.dev/#/patchset/20260628142420.1051027-1-abd.masalkhi@xxxxxxxxx?part=3
Signed-off-by: Abd-Alrhman Masalkhi <abd.masalkhi@xxxxxxxxx>
---
drivers/md/md.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 66a41d482e59..c85ebb59535b 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -10116,7 +10116,8 @@ static int remove_and_add_spares(struct mddev *mddev,
return spares;
}
-static bool md_choose_sync_action(struct mddev *mddev, int *spares)
+static bool md_choose_sync_action(struct mddev *mddev, int *spares,
+ bool array_suspended)
{
/* Check if reshape is in progress first. */
if (mddev->reshape_position != MaxSector) {
@@ -10132,7 +10133,9 @@ static bool md_choose_sync_action(struct mddev *mddev, int *spares)
/* Check if resync is in progress. */
if (mddev->resync_offset < MaxSector) {
- remove_spares(mddev, NULL);
+ if (array_suspended)
+ remove_spares(mddev, NULL);
+
set_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_RECOVER, &mddev->recovery);
clear_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery);
@@ -10144,7 +10147,11 @@ static bool md_choose_sync_action(struct mddev *mddev, int *spares)
* also removed and re-added, to allow the personality to fail the
* re-add.
*/
- *spares = remove_and_add_spares(mddev, NULL);
+ if (array_suspended)
+ *spares = remove_and_add_spares(mddev, NULL);
+ else
+ *spares = 0;
+
if (*spares || test_bit(MD_RECOVERY_LAZY_RECOVER, &mddev->recovery)) {
clear_bit(MD_RECOVERY_SYNC, &mddev->recovery);
clear_bit(MD_RECOVERY_CHECK, &mddev->recovery);
@@ -10189,11 +10196,12 @@ static void md_start_sync(struct work_struct *ws)
* As we only add devices that are already in-sync, we can
* activate the spares immediately.
*/
- remove_and_add_spares(mddev, NULL);
+ if (suspend)
+ remove_and_add_spares(mddev, NULL);
goto not_running;
}
- if (!md_choose_sync_action(mddev, &spares))
+ if (!md_choose_sync_action(mddev, &spares, suspend))
goto not_running;
if (!mddev->pers->sync_request)
--
2.43.0