Re: [BUG] possible race between md_free_disk and md_notify_reboot

From: Yu Kuai
Date: Thu Feb 20 2025 - 06:55:35 EST


Hi,

在 2025/02/20 12:06, Yu Kuai 写道:
Hi,

在 2025/02/20 11:45, Guillaume Morin 写道:
On 20 Feb 11:19, Yu Kuai wrote:

Hi,

在 2025/02/20 11:05, Guillaume Morin 写道:
how it was guaranteed that mddev_get() would fail as mddev_free() does not check or synchronize with the active atomic

Please check how mddev is freed, start from mddev_put(). There might be
something wrong, but it's not what you said.

I will take a look. Though if you're confident that this logic protects
any uaf, that makes sense to me.

However as I mentioned this is not what the crash was about (I mentioned
the UAF in passing). The GPF seems to be about deleting the _next_
pointer while iterating over all mddevs. The mddev_get on the
current item is not going to help with this.


You don't need to emphasize this, it is still speculate without solid
theoretical analysis. The point about mddev_get() is that it's done
inside the lock, it shoud gurantee continue iterating should be fine.

I just take a quick look, the problem looks obviously to me, see how
md_seq_show() handle the iteration.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 465ca2af1e6e..7c7a58f618c1 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -9911,8 +9911,11 @@ static int md_notify_reboot(struct notifier_block *this,
                        mddev_unlock(mddev);
                }
                need_delay = 1;
-               mddev_put(mddev);
-               spin_lock(&all_mddevs_lock);
+
+               spin_lock(&all_mddevs_lock)
+               if (atomic_dec_and_test(&mddev->active))
+                       __mddev_put(mddev);
+
        }
        spin_unlock(&all_mddevs_lock);

While cooking the patch, this is not enough, list_for_each_entry_safe()
should be replaced with list_for_each_entry() as well.

Will send the patch soon, with:

Reported-by: Guillaume Morin <guillaume@xxxxxxxxxxx>

Thanks,
Kuai


Thanks,
Kuai

.