Re: [PATCH v4 0/2] cgroup: allow management of subtrees by new cgroup namespaces

From: James Bottomley
Date: Fri May 20 2016 - 13:28:54 EST


On Fri, 2016-05-20 at 12:53 -0400, Tejun Heo wrote:
> Hello,
>
> On Fri, May 20, 2016 at 12:25:09PM -0400, James Bottomley wrote:
> > OK, so is the only problem cleanup? If so, what if I proposed that
> > a
>
> For generic cases, it's a much larger problem. We'd have to change
> delegation model completely so that delegations are allowed by
> default, which btw can't be allowed on v1 hierarchies as some
> controllers don't behave properly hierarchically in v1 and would
> allow unpriv users to escape the constraints of its ancestors.

Just so I'm clear: by delegation you mean create a subdirectory in the
cgroup hierarchy with a non-root owner? We may have a solution for the
escape constraints problem: see below.

> > cgroup directory could only be created by the owner of the userns
> > (which would be any old unprivileged user) iff they create a cgroup
> > ns and the cgroup ns would be responsible for removing it again, so
> > the cgroup subdirectory would be tied to the cgroup namespace as
> > its holder and we'd use release of the cgroup to remove all the
> > directories?
>
> Unfortunately, cgroup hierarchy isn't designed to support this sort
> of automatic delegation. Unpriv processes would be able to escape
> constraints on v1 with some controllers and on v2 controllers have to
> be explicitly enabled by root for delegated scope to have access to
> them.

Not necessarily. We also talked about pinning the cgroup tree so that
once you enter the cgroup namespace, your current cgroup directory
becomes your root, meaning you can't cd back into the ancestors and
thus can't write their tasks file, meaning, I think, that it should be
impossible to escape ancestor constraints.

> We can try to isolate these delegated subtrees and make them
> work transparently, which rgroup tried to do, but that collides
> directly with the vfs conventions (rgroups don't show up in cgroup
> hierarchy at all so avoid this problem).

Well, let's see if we can solve it within the current framework first.

>
> Why does an unpriv NS need to have cgroup delegated to it without
> cooperation from cgroup manager?

There's actually many answers to this. The one I'm insterested in is
the ability for applications to make use of container features without
having to ask permission from some orchestration engine. The problem
most people are looking at is how do I prevent the cgroup manager from
running as root, because that's a security problem waiting to happen.

> If for resource control, I'm pretty sure we don't want to allow
> that without explicit cooperation from the enclosing scope.

The enclosing scope should be allowed to define the parameters (happens
today with namespaces) but there shouldn't be an active "thing" which
is the permission gateway.

> Overall, it feels like this is trying to work around an issue which
> should be solved from userland.

So it's not impossible to have some setuid (or CAP_ scoped) universal
binary do this. We do this today for the user namespace range of uids
problem. However, it would have to be something that operated
independently of the cgroup manager, since every container
orchestration system wants to be their own cgroup manager, so there's
no one true one.

James