Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount

From: James Bottomley
Date: Wed Feb 08 2017 - 10:25:29 EST


On Wed, 2017-02-08 at 08:44 +0200, Amir Goldstein wrote:
> On Wed, Feb 8, 2017 at 1:42 AM, James Bottomley
[...]
> > So I've been thinking about how to do this without subtree marking
> > and yet retain the subtree properties similar to project id. The
> > advantage would be that if it can be done using only inode
> > properties, then none of the permission prototypes need change.
> > The only real subtree property we need is ability to bind into an
> > unprivileged mount namespace, but we already have that. The gotcha
> > about marking inodes is that they're all or nothing, so every
> > subtree that gets access to the inode inherits the mark. This
> > means that we cannot allow a user access to a marked inode without
> > the cover of an unprivileged user namespace, but I think that's
> > fixable in the permission check (basically if the inode is marked
> > you *only* get access if you have a user_ns != init_user_ns and we
> > do the permission shifts or you have user_ns == init_user_ns and
> > you are admin capable).
> >
>
> I didn't follow, but it sounds like your proposed solutions is only
> good for single level of userns nesting. Do you think you can
> redefine it in terms of "container root projid".

I don't quite understand what you're getting at. user_ns mappings
nest, but what we see depends on where you're trying to look at it.
Let's take the kernel's view as the primary one. That's the kuid_t.
The user has a different view, the uid_t and now we have the
filesystem view (no actual type for this). The user view is produced
by from the kernel view by chaining up all the maps from the
current_user_ns and the filesystem view is produced by doing the same
thing for the s_user_ns. So however many levels of user namespace
nesting we have operating, we only have three views of what an id is:
the user view, the kernel view and the filesystem view. All nesting
does is change how those views are mapped but it doesn't alter the
number of views.

What the original shiftfs patches (not the ones that use s_user_ns) did
was to introduce effectively an inode view and map between the kernel
and the inode view using the shift mapping parameters; then the inode
view would get mapped through the s_user_ns to become the filesystem
view. In the s_user_ns version of shiftfs (the current patches),
there's still an inode view, but we know that what we want to write to
disk is the user view, so effectively the user view and the inode view
become the same if the filesystem is marked otherwise the inode view
and the kernel view are the same if it isn't. That's why I only need a
single bit to tell me if I'm mapping or not and there are two separate
regimes to check the permissions in: the user == inode view and the
kernel == inode view.

James