Re: [RFC v2 PATCH 0/8] VFS:userns: support portable root filesystems

From: James Bottomley
Date: Tue May 10 2016 - 23:47:50 EST


On Wed, 2016-05-11 at 01:53 +0100, Al Viro wrote:
> On Tue, May 10, 2016 at 04:36:56PM -0700, James Bottomley wrote:
> > +static int shiftfs_rename2(struct inode *olddir, struct dentry
> > *old,
> > + struct inode *newdir, struct dentry
> > *new,
> > + unsigned int flags)
> > +{
> > + struct dentry *rodd = olddir->i_private, *rndd = newdir
> > ->i_private,
> > + *realold = old->d_inode->i_private,
> > + *realnew = new->d_inode->i_private;
> > + struct inode *realolddir = rodd->d_inode, *realnewdir =
> > rndd->d_inode;
> > + const struct inode_operations *iop = realolddir->i_op;
> > + int err;
> > + const struct cred *oldcred, *newcred;
> > +
> > + oldcred = shiftfs_new_creds(&newcred, old->d_sb);
> > + err = iop->rename2(realolddir, realold, realnewdir,
> > realnew, flags);
> > + shiftfs_old_creds(oldcred, &newcred);
>
> ... and you've just violated all locking rules for ->rename2().

Yes, sorry, somehow I missed that when I converted everything else to
the vfs_ functions.

> > +static struct dentry *shiftfs_lookup(struct inode *dir, struct
> > dentry *dentry,
> > + unsigned int flags)
> > +{
> > + struct dentry *real = dir->i_private, *new;
> > + struct inode *reali = real->d_inode, *newi;
> > + const struct cred *oldcred, *newcred;
> > +
> > + /* note: violation of usual fs rules here: dentries are
> > never
> > + * added with d_add. This is because we want no dentry
> > cache
> > + * for shiftfs. All lookups proceed through the dentry
> > cache
> > + * of the underlying filesystem, meaning we always see any
> > + * changes in the underlying */
>
> Bloody wonderful. So
> * we lose caching the negative lookups

We do? They should be cached in the underlying layer's dcache. If
that's not enough, I can hash them, but I was trying to avoid doubling
the dcache size.

> * we've got buggered hardlinks (different inodes for those)

Yes, had a note to do the lookup, but forgot.

> * it has never, ever been tried on -next (would do rather nasty
> things on that d_instantiate())

So this is just a proof of concept; I figured it was best to do it
against current rather than have people who wanted to try it pull in
your tree. I can respin it after the merge window closes.

>
> > +
> > + kfree(sfc);
> > +
> > + return err;
> > +}
>
> > + file->f_op = &sfc->fop;
>
> Lovely - now try that with underlying fs something built modular.
>
> Or try to use it on top of something with non-trivial
> dentry_operations
> (hell, on top of itself, for starters).

So if I add the missing fops_get/put, you're happy with the way this
hijacks f_op and f_inode?

James