Re: [RFC PATCH v2 12/17] cgroup: Remove cgroup v2 no internal process constraint

From: Waiman Long
Date: Mon May 22 2017 - 12:56:24 EST


On 05/19/2017 04:38 PM, Tejun Heo wrote:
> Hello, Waiman.
>
> On Mon, May 15, 2017 at 09:34:11AM -0400, Waiman Long wrote:
>> The rationale behind the cgroup v2 no internal process constraint is
>> to avoid resouorce competition between internal processes and child
>> cgroups. However, not all controllers have problem with internal
>> process competiton. Enforcing this rule may lead to unnatural process
>> hierarchy and unneeded levels for those controllers.
> This isn't necessarily something we can determine by looking at the
> current state of controllers. It's true that some controllers - pid
> and perf - inherently only care about membership of each task but at
> the same time neither really suffers from the constraint either. CPU
> which is the problematic one here and currently only cares about tasks
> actually distributes resources which have parts which are specific to
> domain rather than threads and we don't want to declare that CPU isn't
> domain aware resource because it inherently is.

I agree that it is hard to decide which controller should be regarded as
domain aware and which should not be. That is why I don't attempt to do
that in the v2 patchset.

Unlike my v1 patch where each controller has to be specifically marked
as being a resource domain and hence has special directory for internal
process resource control knobs, the v2 patch leaves the decision up to
the userland. Depending on the context, any controllers can now have
special resource control knobs for internal processes in the
cgroup.resource_domain directory by writing the controller name to the
cgroup.resource_control file. So even the CPU controller can be regarded
as domain aware, if necessary. This is all part of my move to give as
much freedom and flexibility to the userland.

>> This patch removes the no internal process contraint by enabling those
>> controllers that don't like internal process competition to have a
>> separate set of control knobs just for internal processes in a cgroup.
>>
>> A new control file "cgroup.resource_control" is added. Enabling a
>> controller with a "+" prefix will create a separate set of control
>> knobs for that controller in the special "cgroup.resource_domain"
>> sub-directory for all the internal processes. The existing control
>> knobs in the cgroup will then be used to manage resource distribution
>> between internal processes as a group and other child cgroups.
> We would need to declare all major resource controllers to be needing
> that special sub-directory. That'd work around the
> no-internal-process constraint but I don't think it is solving any
> real problems. It's just the kernel doing something that userland can
> do with ease and more context.

All controllers can use the special sub-directory if userland chooses to
do so. The problem that I am trying to address in this patch is to allow
more natural hierarchy that reflect a certain purpose, like the task
classification done by systemd. Restricting tasks only to leaf nodes
makes the hierarchy unnatural and probably difficult to manage.

Regards,
Longman