Re: [PATCH] md: fix array_state=clear sysfs deadlock
From: Xiao Ni
Date: Mon Mar 30 2026 - 11:57:37 EST
On Mon, Mar 30, 2026 at 1:55 PM Yu Kuai <yukuai@xxxxxxxxx> wrote:
>
> From: Yu Kuai <yukuai3@xxxxxxxxxx>
>
> When "clear" is written to array_state, md_attr_store() breaks sysfs
> active protection so the array can delete itself from its own sysfs
> store method.
>
> However, md_attr_store() currently drops the mddev reference before
> calling sysfs_unbreak_active_protection(). Once do_md_stop(..., 0)
> has made the mddev eligible for delayed deletion, the temporary
> kobject reference taken by sysfs_break_active_protection() can become
> the last kobject reference protecting the md kobject.
>
> That allows sysfs_unbreak_active_protection() to drop the last
> kobject reference from the current sysfs writer context. kobject
> teardown then recurses into kernfs removal while the current sysfs
> node is still being unwound, and lockdep reports recursive locking on
> kn->active with kernfs_drain() in the call chain.
>
> Reproducer on an existing level:
> 1. Create an md0 linear array and activate it:
> mknod /dev/md0 b 9 0
> echo none > /sys/block/md0/md/metadata_version
> echo linear > /sys/block/md0/md/level
> echo 1 > /sys/block/md0/md/raid_disks
> echo "$(cat /sys/class/block/sdb/dev)" > /sys/block/md0/md/new_dev
> echo "$(($(cat /sys/class/block/sdb/size) / 2))" > \
> /sys/block/md0/md/dev-sdb/size
> echo 0 > /sys/block/md0/md/dev-sdb/slot
> echo active > /sys/block/md0/md/array_state
> 2. Wait briefly for the array to settle, then clear it:
> sleep 2
> echo clear > /sys/block/md0/md/array_state
>
> The warning looks like:
>
> WARNING: possible recursive locking detected
> bash/588 is trying to acquire lock:
> (kn->active#65) at __kernfs_remove+0x157/0x1d0
> but task is already holding lock:
> (kn->active#65) at sysfs_unbreak_active_protection+0x1f/0x40
> ...
> Call Trace:
> kernfs_drain
> __kernfs_remove
> kernfs_remove_by_name_ns
> sysfs_remove_group
> sysfs_remove_groups
> __kobject_del
> kobject_put
> md_attr_store
> kernfs_fop_write_iter
> vfs_write
> ksys_write
>
> Restore active protection before mddev_put() so the extra sysfs
> kobject reference is dropped while the mddev is still held alive. The
> actual md kobject deletion is then deferred until after the sysfs
> write path has fully returned.
>
> Fixes: 9e59d609763f ("md: call del_gendisk in control path")
> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
> ---
> drivers/md/md.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 521d9b34cd9e..02efe9700256 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -6130,10 +6130,16 @@ md_attr_store(struct kobject *kobj, struct attribute *attr,
> }
> spin_unlock(&all_mddevs_lock);
> rv = entry->store(mddev, page, length);
> - mddev_put(mddev);
>
> + /*
> + * For "array_state=clear", dropping the extra kobject reference from
> + * sysfs_break_active_protection() can trigger md kobject deletion.
> + * Restore active protection before mddev_put() so deletion happens
> + * after the sysfs write path fully unwinds.
> + */
> if (kn)
> sysfs_unbreak_active_protection(kn);
> + mddev_put(mddev);
>
> return rv;
> }
> --
> 2.51.0
>
>
This patch looks good to me.
Reviewed-by: Xiao Ni <xni@xxxxxxxxxx>