Re: [PATCH v1 3/3] cgroup: relax common ancestor restriction for direct descendants

From: James Bottomley
Date: Thu Jul 21 2016 - 11:04:30 EST

On Thu, 2016-07-21 at 10:52 -0400, Tejun Heo wrote:
> Hello, Aleksa.
> On Thu, Jul 21, 2016 at 05:49:36PM +1000, Aleksa Sarai wrote:
> > > > The reason I'm doing this is so that we might be able to
> > > > _practically_ use cgroups as an unprivileged user (something
> > > > that will almost certainly be useful to not just the container
> > > > crowd, but people also planning on using cgroups as advanced
> > > > forms of rlimits).
> > >
> > > I don't get why we need this fragile dance with permissions at
> > > all when the same functionality can be achieved by delegating
> > > explicitly.
> >
> > The key words being "unprivileged user". Currently, if I am a
> > regular user on a system and I want to use the freezer cgroup to
> > pause a process I am running, I have to *go to the administrator
> > and ask them to give me permission to do that*. Why is that
> > necessary? I find it quite troubling that the usecase of an
> > ordinary user on a system trying to use something as useful as
> > cgroups is considered to be "solved" by asking your administrator
> > (or systemd) to do it for you. "Delegating explicitly" is punting
> > on the problem, by saying "just get the administrator to do the
> > setup for you". What if you don't have the opportunity to do that,
> > and it takes you 4 weeks of sending emails for you to get the
> > administrator to do _anything_?
> >
> > This is something I'm trying to fix with my recent work with
> > rootless containers (and quite a few other people are trying to fix
> > it too). Currently we just simply can't do certain operations as an
> > unprivileged user that would be possible *if we could just use
> > cgroups*. Things like the freezer cgroup would be invaluable for
> > containers, and I guarantee that the Chromium and Firefox folks
> > would find it useful to be able to limit browser processes in a
> > similar way.
> I understand what you're trying to achieve but don't think cgroup's
> filesystem interface can accomodate that. To support that level of
> automatic delegation, the API should be providing enough isolation so
> that operations in one domain (user-specific operations) are
> transparent from the other (system-wide administration), which simply
> isn't true for cgroupfs. As a simple example, imagine a process
> being moved to another cgroup racing against the special operations
> you're describing ahead. Both sides are multi-step operations and
> there are no ways of synchronizing against each other from kernel
> side and the outcomes can easily be non-sensical.

So if I understand, it's not about actually moving the tasks: echoing
the pid to the tasks file is atomic and we can mediate races there.
It's about the debris left behind if the admin (or someone with
delegated authority) moves the task to a wholly different cgroup.

Now we have a cgroup directory in the old cgroup, which the current
task has been removed from, for which the current user has permissions
and could then move the task back to. Is that the essence of the