Re: [PATCH v4 0/2] cgroup: allow management of subtrees by new cgroup namespaces

From: James Bottomley
Date: Fri May 20 2016 - 13:50:25 EST


On Fri, 2016-05-20 at 10:33 -0700, Aditya Kali wrote:
> On Fri, May 20, 2016 at 9:25 AM, James Bottomley
> <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, 2016-05-20 at 09:17 -0700, Tejun Heo wrote:
> > > Hello, James.
> > >
> > > On Fri, May 20, 2016 at 12:09:10PM -0400, James Bottomley wrote:
> > > > I think it's just different definitions. If you take on our
> > > > definition of being able to set up a container without any
> > > > admin intervention, do you see our problem: we can't get the
> > > > initial delegation of the hierarchy.
> > >
> > > Yeah, I can see the difference but we can't solve that by special
> > > casing NS case.
> >
> > Great, we agree on the problem definition ... as I said, I'm not
> > saying this patch is the solution, but it gives us a starting point
> > for exploring whether there is a solution.
> >
> > > This is stemming from the fact that an unpriv application can't
> > > create its sub-cgroups without explicit delegation from the root
> > > and that has always been an explicit design choice.
> > > It's tied to who's responsible for cleanup afterwards and what
> > > happens when the process gets migrated to a different cgroup.
> > > The latter is an important issue on v1 hierarchies because
> > > migrating tasks sometimes is used as a way to control resource
> > > distribution.
> >
> > OK, so is the only problem cleanup? If so, what if I proposed that
> > a cgroup directory could only be created by the owner of the userns
> > (which would be any old unprivileged user) iff they create a cgroup
> > ns and the cgroup ns would be responsible for removing it again, so
> > the cgroup subdirectory would be tied to the cgroup namespace as
> > its holder and we'd use release of the cgroup to remove all the
> > directories?
> >
>
> cgroup namspace doesn't own the resources in the cgroupns-root, and
> so I am not sure how it will be able to do the cleanup either. I.e,
> even if all the processes in the cgroup ns die, it doesn't mean that
> the cgroupns-root they belonged to is available for cleanup. For this
> reason, one of the implicit design choice in cgroupns was that the
> cgroup-ns root should already exist and the target process should
> already be moved to it (presumably by some admin process) before
> creating the cgroupns.
>
> Moreover, the subsystem controllers (cpu, memory, etc.) are oblivious
> to cgroup namespaces. So, for example, creating new cgroup namespace
> doesn't affect the reclaim behavior.

That doesn't mean we can't give them an owning cgroup namespace. It's
more or less the way user ns works today for all the other namespaces.
However, lets try to arrive at a proposal that works before we start
thinking about the implementation.

> But, allowing creation/modification of sub-cgroups affects it. So I
> think allowing any unprivileged process to do that cannot be
> considered safe for now. Explicit approval from some admin process
> will still be needed (which can be given by chmod/chown today).

Well, that's the essence of the question. Can this be done in a safe
way for cgroups like it is for namespaces today?

James