Re: [PATCH v4 1/3] md: suspend array before raid10 reshape via sync_action

From: Chen Cheng

Date: Tue Jun 23 2026 - 00:31:19 EST

在 2026/6/21 03:43, yu kuai 写道:
> Hi,
>
> 在 2026/6/3 11:59, Chen Cheng 写道:
>> From: Chen Cheng <chencheng@xxxxxxxxx>
>>
>> The sync_action=reshape path currently enters mddev_start_reshape() with
>> reconfig_mutex held but without suspending the array first. For raid10,
>> that means raid10_start_reshape() has to drop reconfig_mutex and reacquire
>> the array through mddev_suspend_and_lock_nointr() before it can safely
>> switch geometry-dependent state.
>>
>> Use mddev_suspend_and_lock() for ACTION_RESHAPE in action_store(), so
>> the sysfs reshape path reaches mddev_start_reshape() with the array
>> already suspended and locked.
>>
>> Other sync_action operations keep using mddev_lock() unchanged.
>>
>> Signed-off-by: Chen Cheng <chencheng@xxxxxxxxx>
>> ---
>> drivers/md/md.c | 22 +++++++++++++++++-----
>> 1 file changed, 17 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/md/md.c b/drivers/md/md.c
>> index 096bb64e87bd..5bc937e149ac 100644
>> --- a/drivers/md/md.c
>> +++ b/drivers/md/md.c
>> @@ -5256,30 +5256,39 @@ static int mddev_start_reshape(struct mddev *mddev)
>>
>> static ssize_t
>> action_store(struct mddev *mddev, const char *page, size_t len)
>> {
>> int ret;
>> + bool suspended = false;
>> enum sync_action action;
>>
>> if (!mddev->pers || !mddev->pers->sync_request)
>> return -EINVAL;
>>
>> + action = md_sync_action_by_name(page);
>> retry:
>> if (work_busy(&mddev->sync_work))
>> flush_work(&mddev->sync_work);
>>
>> - ret = mddev_lock(mddev);
>> + if (action == ACTION_RESHAPE) {
>> + ret = mddev_suspend_and_lock(mddev);
>
> suspend during retry is too heavy. Please move suspend above retry.

hi,

If we move suspend before flush_work , that would have a deadlock window
from sashiko-bot review comment.

code like:
action_store(struct mddev *mddev, const char *page, size_t len)
{
...

+ action = md_sync_action_by_name(page);
+ if (action == ACTION_RESHAPE) {
+ ret = mddev_suspend(mddev, true);
+ if (ret)
+ return ret;
+ suspended = true;
+ }
retry:
if (work_busy(&mddev->sync_work))
flush_work(&mddev->sync_work);

ret = mddev_lock(mddev);

and the scenario is :

reshape:
action_store() -> mddev_suspend() -> flush_work()

mddev_suspend(), mark implicit GFP_NOIO allocation scope for thread#1 by
memalloc_noio_save().

flush_work() call sync_work handler md_start_sync() and would call
kzalloc_obj by flag GFP_KERNEL.

So.. when memory reclaim issue I/O to array and it's suspended,
md_handle_request() wait in !is_suspended(mddev, bio) by sleep , cause
deadlock.

Seems both way is unacceptable ?

>
>> + suspended = true;
>> + } else {
>> + ret = mddev_lock(mddev);
>> + suspended = false;
>> + }
>> if (ret)
>> return ret;
>>
>> if (work_busy(&mddev->sync_work)) {
>> - mddev_unlock(mddev);
>> + if (suspended)
>> + mddev_unlock_and_resume(mddev);
>> + else
>> + mddev_unlock(mddev);
>> goto retry;
>> }
>>
>> - action = md_sync_action_by_name(page);
>> -
>> /* TODO: mdadm rely on "idle" to start sync_thread. */
>> if (test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
>> switch (action) {
>> case ACTION_FROZEN:
>> md_frozen_sync_thread(mddev);
>> @@ -5344,11 +5353,14 @@ action_store(struct mddev *mddev, const char *page, size_t len)
>> md_wakeup_thread(mddev->thread);
>> sysfs_notify_dirent_safe(mddev->sysfs_action);
>> ret = len;
>>
>> out:
>> - mddev_unlock(mddev);
>> + if (suspended)
>> + mddev_unlock_and_resume(mddev);
>> + else
>> + mddev_unlock(mddev);
>> return ret;
>> }
>>
>> static struct md_sysfs_entry md_scan_mode =
>> __ATTR_PREALLOC(sync_action, S_IRUGO|S_IWUSR, action_show, action_store);
>