Re: [PATCH v3 2/2] cgroup: allow management of subtrees by new cgroup namespaces
From: Aleksa Sarai
Date: Mon May 02 2016 - 21:52:35 EST
Change the mode of the cgroup directory for each cgroup association,
allowing the process to create subtrees and modify the limits of the
subtrees *without* allowing the process to modify its own limits. Due to
the cgroup core restrictions and unix permission model, this allows for
processes to create new subtrees without breaking the cgroup limits for
the process.
I don't get why this is necessary. What's wrong with the parent
setting up permission correctly for the namespace?
The parent setting this up requires either:
1. A privileged process giving the process write access to the cgroup
directory it is currently in. Since no software does this by default,
and in addition it might not always make sense (systemd doesn't like
processes messing around in their respective cgroups), this has to be
dealt with better.
2. The process itself is a privileged process, which is not the usecase
I'm going for with rootless containers. If you have root, you can do
whatever you want in this regard and this feature doesn't affect you.
The main reason for this patchset is because I would like to make sure
that unprivileged processes can take advantage of cgroup features (such
as the freezer cgroup, and to just do regular resource limiting). Since
cgroups are a hierarchy, I can see no fundamental reason why this is not
possible. And the cgroup namespace appears to be the perfect way of
doing it. I firmly believe there is a simple and safe way of allowing
unprivileged processes to create subtrees of their current cgroup.
However, I agree with James that this patchset isn't ideal (it was my
first rough attempt). I think I'll get to work on properly virtualising
/sys/fs/cgroup, which will allow for a new cgroup namespace to modify
subtrees (but without allowing for cgroup escape) -- by pinning what pid
namespace the cgroup was created under. We can use the same type of
virtualization that /proc does (except instead of selectively showing
the dentries, we selectively show different owners of the dentries).
Would that be acceptable?
--
Aleksa Sarai
Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/