Re: [PATCH 02/17] fix automount/automount race properly
From: Christian Brauner
Date: Thu Jan 30 2020 - 09:34:22 EST
On Sun, Jan 19, 2020 at 03:17:14AM +0000, Al Viro wrote:
> From: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>
> Protection against automount/automount races (two threads hitting the same
> referral point at the same time) is based upon do_add_mount() prevention of
> identical overmounts - trying to overmount the root of mounted tree with
> the same tree fails with -EBUSY. It's unreliable (the other thread might've
> mounted something on top of the automount it has triggered) *and* causes
> no end of headache for follow_automount() and its caller, since
> finish_automount() behaves like do_new_mount() - if the mountpoint to be is
> overmounted, it mounts on top what's overmounting it. It's not only wrong
> (we want to go into what's overmounting the automount point and quietly
> discard what we planned to mount there), it introduces the possibility of
> original parent mount getting dropped. That's what 8aef18845266 (VFS: Fix
> vfsmount overput on simultaneous automount) deals with, but it can't do
> anything about the reliability of conflict detection - if something had
> been overmounted the other thread's automount (e.g. that other thread
> having stepped into automount in mount(2)), we don't get that -EBUSY and
> the result is
> referral point under automounted NFS under explicit overmount
> under another copy of automounted NFS
>
> What we need is finish_automount() *NOT* digging into overmounts - if it
> finds one, it should just quietly discard the thing it was asked to mount.
> And don't bother with actually crossing into the results of finish_automount() -
> the same loop that calls follow_automount() will do that just fine on the
> next iteration.
>
> IOW, instead of calling lock_mount() have finish_automount() do it manually,
> _without_ the "move into overmount and retry" part. And leave crossing into
> the results to the caller of follow_automount(), which simplifies it a lot.
>
> Moral: if you end up with a lot of glue working around the calling conventions
> of something, perhaps these calling conventions are simply wrong...
>
> Fixes: 8aef18845266 (VFS: Fix vfsmount overput on simultaneous automount)
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
I mean, just reading this is awefully complicated but the code seems
fine.
Acked-by: Christian Brauner <christian.brauner@xxxxxxxxxx>
> ---
> fs/namei.c | 29 ++++-------------------------
> fs/namespace.c | 41 ++++++++++++++++++++++++++++++++++-------
> 2 files changed, 38 insertions(+), 32 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index d2720dc71d0e..bd036dfdb0d9 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1133,11 +1133,9 @@ EXPORT_SYMBOL(follow_up);
> * - return -EISDIR to tell follow_managed() to stop and return the path we
> * were called with.
> */
> -static int follow_automount(struct path *path, struct nameidata *nd,
> - bool *need_mntput)
> +static int follow_automount(struct path *path, struct nameidata *nd)
> {
> struct vfsmount *mnt;
> - int err;
>
> if (!path->dentry->d_op || !path->dentry->d_op->d_automount)
> return -EREMOTE;
> @@ -1178,29 +1176,10 @@ static int follow_automount(struct path *path, struct nameidata *nd,
> return PTR_ERR(mnt);
> }
>
> - if (!mnt) /* mount collision */
> - return 0;
> -
> - if (!*need_mntput) {
> - /* lock_mount() may release path->mnt on error */
> - mntget(path->mnt);
> - *need_mntput = true;
> - }
> - err = finish_automount(mnt, path);
> -
> - switch (err) {
> - case -EBUSY:
> - /* Someone else made a mount here whilst we were busy */
> + if (!mnt)
> return 0;
> - case 0:
> - path_put(path);
> - path->mnt = mnt;
> - path->dentry = dget(mnt->mnt_root);
> - return 0;
> - default:
> - return err;
> - }
>
> + return finish_automount(mnt, path);
> }
>
> /*
> @@ -1258,7 +1237,7 @@ static int follow_managed(struct path *path, struct nameidata *nd)
>
> /* Handle an automount point */
> if (flags & DCACHE_NEED_AUTOMOUNT) {
> - ret = follow_automount(path, nd, &need_mntput);
> + ret = follow_automount(path, nd);
> if (ret < 0)
> break;
> continue;
> diff --git a/fs/namespace.c b/fs/namespace.c
> index 5f0a80f17651..f1817eb5f87d 100644
> --- a/fs/namespace.c
> +++ b/fs/namespace.c
> @@ -2823,6 +2823,7 @@ static int do_new_mount(struct path *path, const char *fstype, int sb_flags,
>
> int finish_automount(struct vfsmount *m, struct path *path)
> {
> + struct dentry *dentry = path->dentry;
> struct mount *mnt = real_mount(m);
> struct mountpoint *mp;
> int err;
> @@ -2832,21 +2833,47 @@ int finish_automount(struct vfsmount *m, struct path *path)
> BUG_ON(mnt_get_count(mnt) < 2);
>
> if (m->mnt_sb == path->mnt->mnt_sb &&
> - m->mnt_root == path->dentry) {
> + m->mnt_root == dentry) {
> err = -ELOOP;
> - goto fail;
> + goto discard;
> }
>
> - mp = lock_mount(path);
> + /*
> + * we don't want to use lock_mount() - in this case finding something
> + * that overmounts our mountpoint to be means "quitely drop what we've
> + * got", not "try to mount it on top".
> + */
> + inode_lock(dentry->d_inode);
> + if (unlikely(cant_mount(dentry))) {
> + err = -ENOENT;
> + goto discard1;
> + }
> + namespace_lock();
> + rcu_read_lock();
> + if (unlikely(__lookup_mnt(path->mnt, dentry))) {
That means someone has already performed that mount in the meantime, I
take it.
> + rcu_read_unlock();
> + err = 0;
> + goto discard2;
> + }
> + rcu_read_unlock();
> + mp = get_mountpoint(dentry);
> if (IS_ERR(mp)) {
> err = PTR_ERR(mp);
> - goto fail;
> + goto discard2;
> }
> +
> err = do_add_mount(mnt, mp, path, path->mnt->mnt_flags | MNT_SHRINKABLE);
> unlock_mount(mp);
> - if (!err)
> - return 0;
> -fail:
> + if (unlikely(err))
> + goto discard;
> + mntput(m);
Probably being dense here but better safe than sorry: this mntput()
corresponds to the get_mountpoint() above, right?