Re: [PATCH RFC 0/1] mount: universally disallow mounting over symlinks

From: Ian Kent
Date: Sun Jan 12 2020 - 20:48:36 EST


On Fri, 2020-01-10 at 23:19 +0000, Al Viro wrote:
> On Fri, Jan 03, 2020 at 01:49:01AM +0000, Al Viro wrote:
> > On Thu, Jan 02, 2020 at 02:59:20PM +1100, Aleksa Sarai wrote:
> > > On 2020-01-01, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > > > On Thu, Jan 02, 2020 at 01:44:07AM +1100, Aleksa Sarai wrote:
> > > >
> > > > > Thanks, this fixes the issue for me (and also fixes another
> > > > > reproducer I
> > > > > found -- mounting a symlink on top of itself then trying to
> > > > > umount it).
> > > > >
> > > > > Reported-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> > > > > Tested-by: Aleksa Sarai <cyphar@xxxxxxxxxx>
> > > >
> > > > Pushed into #fixes.
> > >
> > > Thanks. One other thing I noticed is that umount applies to the
> > > underlying symlink rather than the mountpoint on top. So, for
> > > example
> > > (using the same scripts I posted in the thread):
> > >
> > > # ln -s /tmp/foo link
> > > # ./mount_to_symlink /etc/passwd link
> > > # umount -l link # will attempt to unmount "/tmp/foo"
> > >
> > > Is that intentional?
> >
> > It's a mess, again in mountpoint_last(). FWIW, at some point I
> > proposed
> > to have nd_jump_link() to fail with -ELOOP if the target was a
> > symlink;
> > Linus asked for reasons deeper than my dislike of the semantics, I
> > looked
> > around and hadn't spotted anything. And there hadn't been at the
> > time,
> > but when four months later umount_lookup_last() went in I failed to
> > look
> > for that source of potential problems in it ;-/
>
> FWIW, since Ian appears to agree that we want ->d_manage() on the
> mount
> crossing at the end of umount(2) lookup, here's a much simpler
> solution -
> kill mountpoint_last() and switch to using lookup_last(). As a side
> benefit, LOOKUP_NO_REVAL also goes away. It's possible to trim the
> things even more (path_mountpoint() is very similar to
> path_lookupat()
> at that point, and it's not hard to make the differences conditional
> on
> something like LOOKUP_UMOUNT); I would rather do that part in the
> cleanups series - the one below is easier to backport.
>
> Aleksa, Ian - could you see if the patch below works for you?

I did try this patch and I was trying to work out why it didn't
work. But thought I'd let you know what I saw.

Applying it to current Linus tree systemd stops at switch root.

Not sure what causes that, I couldn't see any reason for it.

I see you have a development branch in your repo. I'll have a look
at that rather than continue with this.

>
> commit e56b43b971a7c08762fceab330a52b7245041dbc
> Author: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
> Date: Fri Jan 10 17:17:19 2020 -0500
>
> reimplement path_mountpoint() with less magic
>
> ... and get rid of a bunch of bugs in it. Background:
> the reason for path_mountpoint() is that umount() really doesn't
> want attempts to revalidate the root of what it's trying to
> umount.
> The thing we want to avoid actually happen from complete_walk();
> solution was to do something parallel to normal path_lookupat()
> and it both went overboard and got the boilerplate subtly
> (and not so subtly) wrong.
>
> A better solution is to do pretty much what the normal
> path_lookupat()
> does, but instead of complete_walk() do unlazy_walk(). All it
> takes
> to avoid that ->d_weak_revalidate() call... mountpoint_last()
> goes
> away, along with everything it got wrong, and so does the magic
> around
> LOOKUP_NO_REVAL.
>
> Another source of bugs is that when we traverse mounts at the
> final
> location (and we need to do that - umount . expects to get
> whatever's
> overmounting ., if any, out of the lookup) we really ought to
> take
> care of ->d_manage() - as it is, manual umount of autofs
> automount
> in progress can lead to unpleasant surprises for the
> daemon. Easily
> solved by using handle_lookup_down() instead of follow_mount().
>
> Signed-off-by: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
>
> diff --git a/fs/namei.c b/fs/namei.c
> index d6c91d1e88cb..1793661c3342 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -1649,17 +1649,15 @@ static struct dentry *__lookup_slow(const
> struct qstr *name,
> if (IS_ERR(dentry))
> return dentry;
> if (unlikely(!d_in_lookup(dentry))) {
> - if (!(flags & LOOKUP_NO_REVAL)) {
> - int error = d_revalidate(dentry, flags);
> - if (unlikely(error <= 0)) {
> - if (!error) {
> - d_invalidate(dentry);
> - dput(dentry);
> - goto again;
> - }
> + int error = d_revalidate(dentry, flags);
> + if (unlikely(error <= 0)) {
> + if (!error) {
> + d_invalidate(dentry);
> dput(dentry);
> - dentry = ERR_PTR(error);
> + goto again;
> }
> + dput(dentry);
> + dentry = ERR_PTR(error);
> }
> } else {
> old = inode->i_op->lookup(inode, dentry, flags);
> @@ -2618,72 +2616,6 @@ int user_path_at_empty(int dfd, const char
> __user *name, unsigned flags,
> EXPORT_SYMBOL(user_path_at_empty);
>
> /**
> - * mountpoint_last - look up last component for umount
> - * @nd: pathwalk nameidata - currently pointing at parent
> directory of "last"
> - *
> - * This is a special lookup_last function just for umount. In this
> case, we
> - * need to resolve the path without doing any revalidation.
> - *
> - * The nameidata should be the result of doing a LOOKUP_PARENT
> pathwalk. Since
> - * mountpoints are always pinned in the dcache, their ancestors are
> too. Thus,
> - * in almost all cases, this lookup will be served out of the
> dcache. The only
> - * cases where it won't are if nd->last refers to a symlink or the
> path is
> - * bogus and it doesn't exist.
> - *
> - * Returns:
> - * -error: if there was an error during lookup. This includes
> -ENOENT if the
> - * lookup found a negative dentry.
> - *
> - * 0: if we successfully resolved nd->last and found it to not
> to be a
> - * symlink that needs to be followed.
> - *
> - * 1: if we successfully resolved nd->last and found it to be a
> symlink
> - * that needs to be followed.
> - */
> -static int
> -mountpoint_last(struct nameidata *nd)
> -{
> - int error = 0;
> - struct dentry *dir = nd->path.dentry;
> - struct path path;
> -
> - /* If we're in rcuwalk, drop out of it to handle last component
> */
> - if (nd->flags & LOOKUP_RCU) {
> - if (unlazy_walk(nd))
> - return -ECHILD;
> - }
> -
> - nd->flags &= ~LOOKUP_PARENT;
> -
> - if (unlikely(nd->last_type != LAST_NORM)) {
> - error = handle_dots(nd, nd->last_type);
> - if (error)
> - return error;
> - path.dentry = dget(nd->path.dentry);
> - } else {
> - path.dentry = d_lookup(dir, &nd->last);
> - if (!path.dentry) {
> - /*
> - * No cached dentry. Mounted dentries are
> pinned in the
> - * cache, so that means that this dentry is
> probably
> - * a symlink or the path doesn't actually point
> - * to a mounted dentry.
> - */
> - path.dentry = lookup_slow(&nd->last, dir,
> - nd->flags |
> LOOKUP_NO_REVAL);
> - if (IS_ERR(path.dentry))
> - return PTR_ERR(path.dentry);
> - }
> - }
> - if (d_flags_negative(smp_load_acquire(&path.dentry->d_flags)))
> {
> - dput(path.dentry);
> - return -ENOENT;
> - }
> - path.mnt = nd->path.mnt;
> - return step_into(nd, &path, 0, d_backing_inode(path.dentry),
> 0);
> -}
> -
> -/**
> * path_mountpoint - look up a path to be umounted
> * @nd: lookup context
> * @flags: lookup flags
> @@ -2699,14 +2631,17 @@ path_mountpoint(struct nameidata *nd,
> unsigned flags, struct path *path)
> int err;
>
> while (!(err = link_path_walk(s, nd)) &&
> - (err = mountpoint_last(nd)) > 0) {
> + (err = lookup_last(nd)) > 0) {
> s = trailing_symlink(nd);
> }
> + if (!err)
> + err = unlazy_walk(nd);
> + if (!err)
> + err = handle_lookup_down(nd);
> if (!err) {
> *path = nd->path;
> nd->path.mnt = NULL;
> nd->path.dentry = NULL;
> - follow_mount(path);
> }
> terminate_walk(nd);
> return err;
> diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
> index f64a33d2a1d1..2a82dcce5fc1 100644
> --- a/fs/nfs/nfstrace.h
> +++ b/fs/nfs/nfstrace.h
> @@ -206,7 +206,6 @@ TRACE_DEFINE_ENUM(LOOKUP_AUTOMOUNT);
> TRACE_DEFINE_ENUM(LOOKUP_PARENT);
> TRACE_DEFINE_ENUM(LOOKUP_REVAL);
> TRACE_DEFINE_ENUM(LOOKUP_RCU);
> -TRACE_DEFINE_ENUM(LOOKUP_NO_REVAL);
> TRACE_DEFINE_ENUM(LOOKUP_OPEN);
> TRACE_DEFINE_ENUM(LOOKUP_CREATE);
> TRACE_DEFINE_ENUM(LOOKUP_EXCL);
> @@ -224,7 +223,6 @@ TRACE_DEFINE_ENUM(LOOKUP_DOWN);
> { LOOKUP_PARENT, "PARENT" }, \
> { LOOKUP_REVAL, "REVAL" }, \
> { LOOKUP_RCU, "RCU" }, \
> - { LOOKUP_NO_REVAL, "NO_REVAL" }, \
> { LOOKUP_OPEN, "OPEN" }, \
> { LOOKUP_CREATE, "CREATE" }, \
> { LOOKUP_EXCL, "EXCL" }, \
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 7fe7b87a3ded..07bfb0874033 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -34,7 +34,6 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT,
> LAST_BIND};
>
> /* internal use only */
> #define LOOKUP_PARENT 0x0010
> -#define LOOKUP_NO_REVAL 0x0080
> #define LOOKUP_JUMPED 0x1000
> #define LOOKUP_ROOT 0x2000
> #define LOOKUP_ROOT_GRABBED 0x0008