Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount
From: James Bottomley
Date: Thu Feb 09 2017 - 11:29:05 EST
On Thu, 2017-02-09 at 02:36 -0800, Josh Triplett wrote:
> On Wed, Feb 08, 2017 at 07:22:45AM -0800, James Bottomley wrote:
> > On Tue, 2017-02-07 at 17:54 -0800, Josh Triplett wrote:
> > > On Tue, Feb 07, 2017 at 11:49:33AM -0800, Christoph Hellwig
> > > wrote:
> > > > On Tue, Feb 07, 2017 at 11:02:03AM -0800, James Bottomley
> > > > wrote:
> > > > > > Another option would be to require something like a
> > > > > > project as used for project quotas as the root. This would
> > > > > > also be conveniant as it could storge the used remapping
> > > > > > tables.
> > > > >
> > > > > So this would be like the current project quota except set on
> > > > > a subtree? I could see it being done that way but I don't
> > > > > see what advantage it has over using flags in the subtree
> > > > > itself (the mapping is known based on the mount namespace, so
> > > > > there's really only a single bit of information to store).
> > > >
> > > > projects (which are the underling concept for project quotas)
> > > > are per-subtree in practice - the flag is set on an inode and
> > > > then all directories and files underneath inherit the project
> > > > ID, hardlinking outside a project is prohinited.
> > >
> > > I'm interested in having a VFS-level way to do more than just a
> > > shift; I'd like to be able to arbitrarily remap IDs between
> > > what's on disk and the system IDs.
> >
> > OK, so the shift is effectively an arbitrary remap because it
> > allows multiple ranges to be mapped (althought the userns currently
> > imposes a maximum number of five extents but that limit is a bit
> > arbitrary just to try to limit the amount of space the
> > parametrisation takes). See
> > kernel/user_namespace.c:map_id_up/down()
> >
> > > If we're talking about developing a VFS-level solution for
> > > this, I'd like to avoid limiting it to just a shift. (A
> > > shift/range would definitely be the simplest solution for many
> > > common container cases, but not all.)
> >
> > I assume the above satisfies you on this point, but raises the
> > question: do you want an arbitrary shift not parametrised by a user
> > namespace? If so how many such shifts do you want ... giving some
> > details of the use case would be helpful.
>
> The limit of five extents means this may not work in the most general
> case, no.
That's not an API limit, so it can be changed if there's a need. The
problem was merely how to parametrise a mapping without taking too much
space.
> One use case: given an on-disk filesystem, its name-to-number
> mapping, and your host name-to-number mapping, mount the filesystem
> with all the UIDs bidirectionally mapped to those on your host
> system.
This is pretty much what the s_user_ns does.
> Another use case: given an on-disk filesystem with potentially
> arbitrary UIDs (not necessarily in a clean contiguous block), and a
> pile of unprivileged UIDs, mount the filesystem such that every on
> -disk UID gets a unique unprivileged UID.
So is this. Basically anything that begins by mounting gets a super
block and can use the s_user_ns to map from the filesystem view to the
kernel view of ids. Apart from greater sophistication in the
parametrisation, it sounds like we have all the machinery you need.
I'm sure the containers people will consider reasonable patches to
change this.
James
> (I have some additional use cases, but they would require the ability
> to extend the mapping on the fly without remounting.)
>