Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount

From: Amir Goldstein
Date: Mon Feb 06 2017 - 01:59:13 EST


On Mon, Feb 6, 2017 at 3:18 AM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Sun, 2017-02-05 at 09:51 +0200, Amir Goldstein wrote:
>> On Sat, Feb 4, 2017 at 9:19 PM, James Bottomley
>> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
>> > This allows any subtree to be uid/gid shifted and bound elsewhere.
>> > It does this by operating simlarly to overlayfs. Its primary use
>> > is for shifting the underlying uids of filesystems used to support
>> > unpriviliged (uid shifted) containers. The usual use case here is
>> > that the container is operating with an uid shifted unprivileged
>> > root but sometimes needs to make use of or work with a filesystem
>> > image that has root at real uid 0.
>> >
>> > The mechanism is to allow any subordinate mount namespace to mount
>> > a shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only
>> > allowing it to mount marked subtrees (using the -o mark option as
>> > root). Once mounted, the subtree is mapped via the super block
>> > user namespace so that the interior ids of the mounting user
>> > namespace are the ids written to the filesystem.
>> >
>> > Signed-off-by: James Bottomley <
>> > James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
>> >
>>
>> James,
>>
>> Allow me to point out some problems in this patch and offer a
>> slightly different approach.
>>
>> First of all, the subject says "uid/gid shifting bind mount", but
>> it's not really a bind mount. What it is is a stackable mount and 2
>> levels of stack no less.
>
> The reason for the description is to have it behave exactly like a bind
> mount. You can assert that a bind mount is, in fact, a stacked mount,
> but we don't currently. I'm also not sure where you get your 2 levels
> from?
>

A bind mount does not incur recursion into VFS code, a stacked fs does.
And there is a programmable limit of stack depth of 2, which stacked
fs need to comply with.
Your proposed setup has 2 stacked fs, the mark shitfs by admin
and the uid shitfs by container user. Or maybe I misunderstood.


>> So one thing that is missing is increasing of sb->s_stack_depth and
>> that also means that shiftfs cannot be used to recursively shift uids
>> in child userns if that was ever the intention.
>
> I can't think of a use case that would ever need that, but perhaps
> other container people can.
>
>> The other problem is that by forking overlayfs functionality,
>
> So this wouldn't really be the right way to look at it: shiftfs shares
> no code with overlayfs at all, so is definitely not a fork. The only
> piece of functionality it has which is similar to overlayfs is the way
> it does lookups via a new dentry cache. However, that functionality is
> not unique to overlayfs and if you look, you'll see that
> shiftfs_lookup() actually has far more in common with
> ecryptfs_lookup().

That's a good point. All stackable file systems may share similar problems
and solutions (e.g. consistent st_ino/st_dev). Perhaps it calls for shared
library code or more generic VFS code.
At the moment ecryptfs is not seeing much development, so everything
happens in overlayfs. If there is going to be more than 1 actively developed
stackable fs, we need to see about that.

>
>> shiftfs is going to miss out on overlayfs bug fixes related to user
>> credentials differ from mounter credentials, like fd3220d ("ovl:
>> update S_ISGID when setting posix ACLs"). I am not sure that this
>> specific case is relevant to shiftfs, but there could be other.
>
> OK, so shiftfs doesn't have this bug and the reason why is
> illustrative: basically shiftfs does three things
>
> 1. lookups via a uid/gid shifted dentry cache
> 2. shifted credential inode operations permission checks on the
> underlying filesystem
> 3. location marking for unprivileged mount
>
> I think we've already seen that 1. isn't from overlayfs but the
> functionality could be added to overlayfs, I suppose. The big problem
> is 2. The overlayfs code emulates the permission checks, which makes
> it rather complex (this is where you get your bugs like the above
> from). I did actually look at adding 2. to overlayfs on the theory
> that a single layer overlay might be closest to what this is, but
> eventually concluded I'd have to take the special cases and add a whole
> lot more to them ... it really would increase the maintenance burden
> substantially and make the code an unreadable rats nest.
>

The use cases for uid shifting are still overwelming for me.
I take your word for it that its going to be a maintanace burdon
to add this functionality to overlayfs.

> When you think about it this way, it becomes obvious that the clean
> separation is if shiftfs functionality is layered on top of overlayfs
> and when you do that, doing it as its own filesystem is more logical.
>

Yes, I agree with that statement. This is inline with the solution I outlined
at the end of my previous email, where single layer overlayfs is used
for the host "mark" mount, although I wonder if the same cannot be
achieved with a bind mount?

in host:
mount -t overlay -o noexec,upper=<origin> container_visible <mark location>

in container:
mount -t shiftfs -o <mark location> <somewhere in my local mount ns>