Re: [RFC v2 PATCH 0/8] VFS:userns: support portable root filesystems

From: Andy Lutomirski
Date: Wed May 04 2016 - 21:44:39 EST

On Wed, May 4, 2016 at 5:23 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, May 04, 2016 at 04:26:46PM +0200, Djalal Harouni wrote:
>> This is version 2 of the VFS:userns support portable root filesystems
>> RFC. Changes since version 1:
>> * Update documentation and remove some ambiguity about the feature.
>> Based on Josh Triplett comments.
>> * Use a new email address to send the RFC :-)
>> This RFC tries to explore how to support filesystem operations inside
>> user namespace using only VFS and a per mount namespace solution. This
>> allows to take advantage of user namespace separations without
>> introducing any change at the filesystems level. All this is handled
>> with the virtual view of mount namespaces.
> [...]
>> As an example if the mapping 0:65535 inside mount namespace and outside
>> is 1000000:1065536, then 0:65535 will be the range that we use to
>> construct UIDs/GIDs mapping into init_user_ns and use it for on-disk
>> data. They represent the persistent values that we want to write to the
>> disk. Therefore, we don't keep track of any UID/GID shift that was applied
>> before, it gives portability and allows to use the previous mapping
>> which was freed for another root filesystem...
> So let me get this straight. Two /isolated/ containers, different
> UID/GID mappings, sharing the same files and directories. Create a
> new file in a writeable directory in container 1, namespace
> information gets stripped from on-disk uid/gid representation.

I think the intent is a totally separate superblock for each
container. Djalal, am I right?

The feature that seems to me to be missing is the ability to squash
uids. I can imagine desktop distros wanting to mount removable
storage such that everything shows up (to permission checks and
stat()) as the logged-in user's uid but that the filesystem sees 0:0.
That can be done by shifting, but the distro would want everything
else on the filesystem to show up as the logged-in user as well.

That use case could also be handled by adding a way to tell a given
filesystem to completely opt out of normal access control rules and
just let a given user act as root wrt that filesystem (and be nosuid,
of course). This would be a much greater departure from current
behavior, but would let normal users chown things on a removable
device, which is potentially nice.