Re: [git pull] vfs fixes

From: Ian Kent
Date: Mon Apr 03 2017 - 20:47:59 EST


On Mon, 2017-04-03 at 01:00 -0500, Eric W. Biederman wrote:
> Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:
>
> > On Sun, Apr 02, 2017 at 05:58:41PM -0700, Linus Torvalds wrote:
> >
> > > I had to go and double-check that "DCACHE_DIRECTORY_TYPE" is what
> > > d_can_lookup() actually checks, so _that_ part is perhaps a bit
> > > subtle, and might be worth noting in that comment that you edited.
> > >
> > > So the real "rule" ends up being that we only ever look up things from
> > > dentries of type DCACHE_DIRECTORY_TYPE set, and those had better have
> > > DCACHE_RCUACCESS bit set.
> > >
> > > And the only reason path_init() only checks it for that case is that
> > > nd->root and nd->pwd both have to be of type d_can_lookup().
> > >
> > > Do we check that when we set it? I hope/assume we do.
> >
> > For chdir()/chroot()/pivot_root() it's done by LOOKUP_DIRECTORY in lookup
> > flags; fchdir() is slightly different - there we check S_ISDIR of inode
> > of opened file.ÂÂWhich is almost the same thing, except for
> > kinda-sorta directories that have no ->lookup() - we have them for
> > NFS referral points.ÂÂIt should be impossible to end up with
> > one of those opened - not even with O_PATH; follow_managed() will be called
> > and we'll either fail or cross into whatever ends up overmounting them.
> > Still, it might be cleaner to turn that check into
> > d_can_lookup(f.file->f_path.dentry)
> > simply for consistency sake.
> >
> > The thing I really don't like is mntns_install().ÂÂWith sufficiently
> > nasty nfsroot setup it might be possible to end up with referral point
> > as one's root/pwd; getting out of such state would be interesting...
> > Smells like that place should be a solitary follow_down(), not a loop
> > of follow_down_one(), but I want Eric's opinion on that one; userns stuff
> > is weird.
>
> If I read the conversation correctly the concern is that we might
> initialize a pwd or root with something that is almost but not quite a
> directory in mntns_install.
>
> Refereshing my memory.ÂÂd_automount mounts things and is what
> is used for nfs referrals.ÂÂd_manage blocks waiting for
> an automounts to complete or expire.ÂÂfollow_down just calls d_manage,
> follow_manage calls both d_manage and d_automount as appropriate.
>
> If the concern is nfs referral points calling follow_down is wrong and
> what is wanted is follow_managed.

The case Al was concerned about (sounds like) where the root (or pwd) being
followed is an NFS referral (a similar case could be NFS file system migration
if (when?) being used, and that's probably more likely to be triggered from a
file system root than a referral).

I can't see how that could happen for a referral, but if it did the follow would
need to call d_automount(). It's unlikely ->d_manage() would factor into it but
it is available for use so should be part of it.

So follow_down() rather than follow_down_one() sounds like the right thing to
do.

>
> The only thing that follow_down prevents is changing onto directories
> that are only half mounted, and not really directories yet.ÂÂWhich
> is certainly part of the invarient we want to preserve.
>
>
>
> The intent of the logic in mntns_install is to just pick a reasonable
> looking place somewhere in that mount namespace to use as a root
> directory.ÂÂI arbitrarily picked the top of the mount stack on "/".ÂÂWhich
> is typically used as the root directory.ÂÂIf people really care where
> their root is they save a directory file descriptor off somewhere and
> call chroot.ÂÂSo there is a little wiggle room exactly what the code
> does.
>
> There is a secondary use of mntns_install which is to give you a way to
> access what is under "/" if you are so foolish as to umount "/".ÂÂI keep
> thinking setns to your own mount namespace would be a handy way to get
> back to the rootfs and to use it for something during system shutdown.
> I don't know if anyone has actually used setns to your own mount
> namespace for that.
>
> The worst case I can see from the proposed change is we would
> not be able to umount all of the way down to rootfs.ÂÂThat
> would be a self inflicted wound so I don't care.
>
> I can't imagine anyone mounting an automount point deliberately on /
> except as way to confuse the vfs.ÂÂThough I can almost imagine getting
> there by accident if an automount expires.
>
> So yes please let's change the follow_down_one loop to follow_managed
> to preserve the invariant that we always have a directory that
> supports d_can_lookup to pass to set_fs_pwd and set_fs_root.
>
> Eric
>
> > diff --git a/fs/dcache.c b/fs/dcache.c
> > index 95d71eda8142..05550139a8a6 100644
> > --- a/fs/dcache.c
> > +++ b/fs/dcache.c
> > @@ -1757,7 +1757,13 @@ static unsigned d_flags_for_inode(struct inode
> > *inode)
> > Â return DCACHE_MISS_TYPE;
> > Â
> > Â if (S_ISDIR(inode->i_mode)) {
> > - add_flags = DCACHE_DIRECTORY_TYPE;
> > + /*
> > + Â* Any potential starting point of lookup should have
> > + Â* DCACHE_RCUACCESS; currently directory dentries
> > + Â* come from d_alloc() anyway, but it costs us nothing
> > + Â* to enforce it here.
> > + Â*/
> > + add_flags = DCACHE_DIRECTORY_TYPE | DCACHE_RCUACCESS;
> > Â if (unlikely(!(inode->i_opflags & IOP_LOOKUP))) {
> > Â if (unlikely(!inode->i_op->lookup))
> > Â add_flags = DCACHE_AUTODIR_TYPE;
> > diff --git a/fs/namei.c b/fs/namei.c
> > index d41fab78798b..19dcf62133cc 100644
> > --- a/fs/namei.c
> > +++ b/fs/namei.c
> > @@ -2145,6 +2145,9 @@ static const char *path_init(struct nameidata *nd,
> > unsigned flags)
> > Â int retval = 0;
> > Â const char *s = nd->name->name;
> > Â
> > + if (!*s)
> > + flags &= ~LOOKUP_RCU;
> > +
> > Â nd->last_type = LAST_ROOT; /* if there are only slashes... */
> > Â nd->flags = flags | LOOKUP_JUMPED | LOOKUP_PARENT;
> > Â nd->depth = 0;
> > diff --git a/fs/namespace.c b/fs/namespace.c
> > index cc1375eff88c..31ec9a79d2d4 100644
> > --- a/fs/namespace.c
> > +++ b/fs/namespace.c
> > @@ -3467,6 +3467,7 @@ static int mntns_install(struct nsproxy *nsproxy,
> > struct ns_common *ns)
> > Â struct fs_struct *fs = current->fs;
> > Â struct mnt_namespace *mnt_ns = to_mnt_ns(ns);
> > Â struct path root;
> > + int err;
> > Â
> > Â if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) ||
> > Â ÂÂÂÂ!ns_capable(current_user_ns(), CAP_SYS_CHROOT) ||
> > @@ -3484,15 +3485,14 @@ static int mntns_install(struct nsproxy *nsproxy,
> > struct ns_common *ns)
> > Â root.mntÂÂÂÂ= &mnt_ns->root->mnt;
> > Â root.dentry = mnt_ns->root->mnt.mnt_root;
> > Â path_get(&root);
> > - while(d_mountpoint(root.dentry) && follow_down_one(&root))
> > - ;
> > -
> > - /* Update the pwd and root */
> > - set_fs_pwd(fs, &root);
> > - set_fs_root(fs, &root);
> > -
> > + err = follow_down(&root);
> > + if (likely(!err)) {
> > + /* Update the pwd and root */
> > + set_fs_pwd(fs, &root);
> > + set_fs_root(fs, &root);
> > + }
> > Â path_put(&root);
> > - return 0;
> > + return err;
> > Â}
> > Â
> > Âstatic struct user_namespace *mntns_owner(struct ns_common *ns)
> > diff --git a/fs/open.c b/fs/open.c
> > index 949cef29c3bb..217b5db588c8 100644
> > --- a/fs/open.c
> > +++ b/fs/open.c
> > @@ -459,20 +459,17 @@ SYSCALL_DEFINE1(chdir, const char __user *, filename)
> > ÂSYSCALL_DEFINE1(fchdir, unsigned int, fd)
> > Â{
> > Â struct fd f = fdget_raw(fd);
> > - struct inode *inode;
> > Â int error = -EBADF;
> > Â
> > Â error = -EBADF;
> > Â if (!f.file)
> > Â goto out;
> > Â
> > - inode = file_inode(f.file);
> > -
> > Â error = -ENOTDIR;
> > - if (!S_ISDIR(inode->i_mode))
> > + if (!d_can_lookup(f.file->f_path.dentry))
> > Â goto out_putf;
> > Â
> > - error = inode_permission(inode, MAY_EXEC | MAY_CHDIR);
> > + error = inode_permission(file_inode(f.file), MAY_EXEC | MAY_CHDIR);
> > Â if (!error)
> > Â set_fs_pwd(current->fs, &f.file->f_path);
> > Âout_putf: