Re: mnt_list corruption triggered during btrfs/326
From: Christian Brauner
Date: Thu Jan 09 2025 - 07:52:09 EST
On Tue, Jan 07, 2025 at 04:00:34PM +0100, Christian Brauner wrote:
> > > Can you please try and reproduce this with
> > > commit 211364bef4301838b2e1 ("fs: kill MNT_ONRB")
> >
> > This patch should indirectly address both errors but it does not
> > explain why the flag is sometimes missing.
>
> Yeah, I'm well aware that's why I didn't fast-track it.
> I just didn't have the time to think about this yet.
I think I know how it happens.
btrfs_get_tree_subvol()
{
mnt = fc_mount()
// Register the newly allocated mount with sb->mounts:
lock_mount_hash();
list_add_tail(&mnt->mnt_instance, &mnt->mnt.mnt_sb->s_mounts);
unlock_mount_hash();
}
So now it's public on sb->s_mounts.
Concurrently someone does a ro remount:
reconfigure_super()
-> sb_prepare_remount_readonly()
{
list_for_each_entry(mnt, &sb->s_mounts, mnt_instance) {
}
This walks all mounts registered in sb->s_mounts and raises
MNT_WRITE_HOLD, then raise MNT_READONLY, and then removes
MNT_WRITE_HOLD.
This can happen concurrently with mount_subvol() because sb->s_umount
isn't held anymore:
-> mount_subvol()
-> mount_subtree()
-> alloc_mnt_ns()
mnt_add_to_ns()
vfs_path_lookup()
put_mnt_ns()
The flag modification of mnt_add_to_ns() races the flag modification of
the read-only remount. So MNT_ONRB might be lost...
If that's correct, then a) we know how this happens and b) that killing
MNT_ONRB is the correct fix for this.