Re: [RFC 1/1] shiftfs: uid/gid shifting bind mount

From: Amir Goldstein
Date: Wed Feb 08 2017 - 01:52:51 EST

On Wed, Feb 8, 2017 at 1:42 AM, James Bottomley
<James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> On Tue, 2017-02-07 at 14:25 -0800, Christoph Hellwig wrote:
>> On Tue, Feb 07, 2017 at 11:01:29PM +0200, Amir Goldstein wrote:
>> > Project id's are not exactly "subtree" semantic, but inheritance
>> > semantics,
>> > which is not the same when non empty directories get their project
>> > id changed.
>> > Here is a recap:
>> >
>> Yes - but if we abuse them for containers we could refine the
>> semantics to simply not allow change of project ids from inside
>> containers based on say capabilities.

You mean something like this:

With the suggested protected_projects, projid 0 (also inside container)
gets a special meaning, much like user 0, so we may do interesting
things with the projid that is mapped to 0.

> We can't really abuse projectid, it's part of the user namespace
> mapping (for project quota). What we can do is have a new id that
> behaves like it.

Perhaps we *can* use projid without abusing it.
userns already maps projids, but there is no concept of "owning project"
for a userns, nor does it make a lot of sense, because projid is not
part of the credentials.
But if we re-brand it as "container root projid", we can try to use it
for defining semantics to grant unprivileged access to a subtree.

The functionality you are trying to get with shiftfs mark does
sounds a bit like "container root projid":
- inodes with mapped projid MAY be uid/gid shifted
- inodes with unmapped projid MAY NOT

I realize this may be very raw, but its a start. If you like this
direction we can try to develop it.

> But like I said, we don't really need a ful ID, it would basically just
> be a single bit mark to say remap or not when doing permission checks
> against this inode. It would follow some of the project id semantics
> (like inheritance from parent dir)

But a single bit would only work for single level of userns nesting won't it?

>> > I guess we should define the semantics for the required sub-tree
>> > marking, before we can talk about solutions.
>> Good plan.
> So I've been thinking about how to do this without subtree marking and
> yet retain the subtree properties similar to project id. The advantage
> would be that if it can be done using only inode properties, then none
> of the permission prototypes need change. The only real subtree
> property we need is ability to bind into an unprivileged mount
> namespace, but we already have that. The gotcha about marking inodes
> is that they're all or nothing, so every subtree that gets access to
> the inode inherits the mark. This means that we cannot allow a user
> access to a marked inode without the cover of an unprivileged user
> namespace, but I think that's fixable in the permission check
> (basically if the inode is marked you *only* get access if you have a
> user_ns != init_user_ns and we do the permission shifts or you have
> user_ns == init_user_ns and you are admin capable).

I didn't follow, but it sounds like your proposed solutions is only
good for single level of userns nesting.
Do you think you can redefine it in terms of "container root projid".