Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount

From: Eric W. Biederman
Date: Mon Feb 13 2017 - 05:20:33 EST


James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> writes:

> On Thu, 2017-02-09 at 02:36 -0800, Josh Triplett wrote:
>> On Wed, Feb 08, 2017 at 07:22:45AM -0800, James Bottomley wrote:
>> > On Tue, 2017-02-07 at 17:54 -0800, Josh Triplett wrote:
>> > > On Tue, Feb 07, 2017 at 11:49:33AM -0800, Christoph Hellwig
>> > > wrote:
>> > > > On Tue, Feb 07, 2017 at 11:02:03AM -0800, James Bottomley
>> > > > wrote:
>> > > > > > Another option would be to require something like a
>> > > > > > project as used for project quotas as the root. This would
>> > > > > > also be conveniant as it could storge the used remapping
>> > > > > > tables.
>> > > > >
>> > > > > So this would be like the current project quota except set on
>> > > > > a subtree? I could see it being done that way but I don't
>> > > > > see what advantage it has over using flags in the subtree
>> > > > > itself (the mapping is known based on the mount namespace, so
>> > > > > there's really only a single bit of information to store).
>> > > >
>> > > > projects (which are the underling concept for project quotas)
>> > > > are per-subtree in practice - the flag is set on an inode and
>> > > > then all directories and files underneath inherit the project
>> > > > ID, hardlinking outside a project is prohinited.
>> > >
>> > > I'm interested in having a VFS-level way to do more than just a
>> > > shift; I'd like to be able to arbitrarily remap IDs between
>> > > what's on disk and the system IDs.
>> >
>> > OK, so the shift is effectively an arbitrary remap because it
>> > allows multiple ranges to be mapped (althought the userns currently
>> > imposes a maximum number of five extents but that limit is a bit
>> > arbitrary just to try to limit the amount of space the
>> > parametrisation takes). See
>> > kernel/user_namespace.c:map_id_up/down()
>> >
>> > > If we're talking about developing a VFS-level solution for
>> > > this, I'd like to avoid limiting it to just a shift. (A
>> > > shift/range would definitely be the simplest solution for many
>> > > common container cases, but not all.)
>> >
>> > I assume the above satisfies you on this point, but raises the
>> > question: do you want an arbitrary shift not parametrised by a user
>> > namespace? If so how many such shifts do you want ... giving some
>> > details of the use case would be helpful.
>>
>> The limit of five extents means this may not work in the most general
>> case, no.
>
> That's not an API limit, so it can be changed if there's a need. The
> problem was merely how to parametrise a mapping without taking too much
> space.
>
>> One use case: given an on-disk filesystem, its name-to-number
>> mapping, and your host name-to-number mapping, mount the filesystem
>> with all the UIDs bidirectionally mapped to those on your host
>> system.
>
> This is pretty much what the s_user_ns does.
>
>> Another use case: given an on-disk filesystem with potentially
>> arbitrary UIDs (not necessarily in a clean contiguous block), and a
>> pile of unprivileged UIDs, mount the filesystem such that every on
>> -disk UID gets a unique unprivileged UID.
>
> So is this. Basically anything that begins by mounting gets a super
> block and can use the s_user_ns to map from the filesystem view to the
> kernel view of ids. Apart from greater sophistication in the
> parametrisation, it sounds like we have all the machinery you need.
> I'm sure the containers people will consider reasonable patches to
> change this.

Yes.

And to be clear we have all of that merged now and mostly present and
hooked up in all filesystems without any shiftfs like changes needed.

To use this with a filesystem a last pass needs to be had to verify that
the cases where something does not map are handled cleanly.

Eric