Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount

From: James Bottomley
Date: Tue Feb 07 2017 - 13:20:18 EST


On Tue, 2017-02-07 at 19:59 +0200, Amir Goldstein wrote:
> On Tue, Feb 7, 2017 at 6:37 PM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> > On Tue, 2017-02-07 at 01:19 -0800, Christoph Hellwig wrote:
> > > On Sat, Feb 04, 2017 at 11:19:32AM -0800, James Bottomley wrote:
> > > > This allows any subtree to be uid/gid shifted and bound
> > > > elsewhere.
> > > > It does this by operating simlarly to overlayfs. Its primary
> > > > use
> > > > is for shifting the underlying uids of filesystems used to
> > > > support
> > > > unpriviliged (uid shifted) containers. The usual use case here
> > > > is
> > > > that the container is operating with an uid shifted
> > > > unprivileged
> > > > root but sometimes needs to make use of or work with a
> > > > filesystem
> > > > image that has root at real uid 0.
> > > >
> > > > The mechanism is to allow any subordinate mount namespace to
> > > > mount
> > > > a shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only
> > > > allowing it to mount marked subtrees (using the -o mark option
> > > > as
> > > > root). Once mounted, the subtree is mapped via the super block
> > > > user namespace so that the interior ids of the mounting user
> > > > namespace are the ids written to the filesystem.
> > >
> > > Please move this into VFS instead of a stackable fs. We might
> > > need
> > > addtional parameters to getattr/setattr to specify the ID
> > > translation, but that's why better than a horrible hack like
> > > this.
> >
> > I would need a lot more than that: getattr controls the cosmetic
> > permission display to the user, but enforcement is done in the core
> > permission checks which are inode based. To make this a real bind
> > mount, the core permission checks will have to become subtree aware
> > because knowledge of whether we need a uid shift in the permission
> > check becomes a subtree property. Effectively inode_permission
> > would
> > become dentry_permission and generic_permission would take a dentry
> > instead of an inode. This will be a huge amount of VFS and
> > underlying
> > filesystem churn, since the permissions calls are threaded through
> > a
> > huge chunk of code.
> >
>
> I am not even sure that would be enough.
> dentry does not contain information about the mount user came from,
> and sb contains only information about the user ns of the mounter of
> the file system, not the mounter of the bind mount, right?
> I think I am missing some big pieces of the big picture.
> Would love to hear what Eric has to say.

I'm not really sure until it gets prototyped, but I think the
filesystem user namespace would also have to become a subtree property.

The whole reason for shiftfs being a properly mounted filesystem is
because it needs a super block to capture the namespace it's being
mounted in.

However, when you have a container that you want remapping inside, you
must have a user namespace which owns a mount namespace, so we can
deduce the information from the mount namespace. All we probably need
the subtree to tell us is if we're shifting or not.

James