Re: [RFC PATCH v2 11/17] cgroup: Implement new thread mode semantics

From: Tejun Heo
Date: Thu Jun 01 2017 - 16:38:28 EST


Hello,

On Thu, Jun 01, 2017 at 03:27:35PM -0400, Waiman Long wrote:
> As said in an earlier email, I agreed that masking it on the kernel side
> may not be the best solution. I offer 2 other alternatives:
> 1) Document on how to work around the resource domains issue by proper
> setup of the cgroup hierarchy.

We can definitely improve documentation.

> 2) Mark those controllers that require the no internal process
> competition constraint and disallow internal process only when those
> controllers are active.

We *can* do that but wouldn't this be equivalent to enabling thread
mode implicitly when only thread aware controllers are enabled?

> I prefer the first alternative, but I can go with the second if necessary.
>
> The major rationale behind my enhanced thread mode patch was to allow
> something like
>
> R -- A -- B
> \
> T1 -- T2
>
> where you can have resource domain controllers enabled in the thread
> root as well as some child cgroups of the thread root. As no internal
> process rule is currently not applicable to the thread root, this
> creates the dilemma that we need to deal with internal process competition.
>
> The container invariant that PeterZ talked about will also be a serious
> issue here as I don't think we are going to set up a container root
> cgroup that will have no process allowed in it because it has some child
> cgroups. IMHO, I don't think cgroup v2 will get wide adoption without
> getting rid of that no internal process constraint.

The only thing which is necessary from inside a container is putting
the management processes into their own cgroups so that they can be
controlled (ie. the same thing you did with your patch but doing that
explicitly from userland) and userland management sw can do the same
thing whether it's inside a container or on a bare system. BTW,
systemd already does so and works completely fine in terms of
containerization on cgroup2. It is arguable whether we should make
this more convenient from kernel side but using cgroup2 for resource
control already requires the userspace tools to be adapted to it, so
I'm not sure how much benefit we'd gain from adding that compared to
explicitly documenting it.

Thanks.

--
tejun