Re: [PATCH v8 2/6] cpuset: Add new v2 cpuset.sched.domain flag

From: Peter Zijlstra
Date: Thu May 24 2018 - 10:49:53 EST


On Thu, May 17, 2018 at 04:55:41PM -0400, Waiman Long wrote:
> A new cpuset.sched.domain boolean flag is added to cpuset v2. This new
> flag indicates that the CPUs in the current cpuset should be treated
> as a separate scheduling domain.

The traditional name for this is a partition.

> This new flag is owned by the parent
> and will cause the CPUs in the cpuset to be removed from the effective
> CPUs of its parent.

This is a significant departure from existing behaviour, but one I can
appreciate. I don't immediately see something terribly wrong with it.

> This is implemented internally by adding a new isolated_cpus mask that
> holds the CPUs belonging to child scheduling domain cpusets so that:
>
> isolated_cpus | effective_cpus = cpus_allowed
> isolated_cpus & effective_cpus = 0
>
> This new flag can only be turned on in a cpuset if its parent is either
> root or a scheduling domain itself with non-empty cpu list. The state
> of this flag cannot be changed if the cpuset has children.
>
> Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
> ---
> Documentation/cgroup-v2.txt | 22 ++++
> kernel/cgroup/cpuset.c | 237 +++++++++++++++++++++++++++++++++++++++++++-
> 2 files changed, 256 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/cgroup-v2.txt b/Documentation/cgroup-v2.txt
> index cf7bac6..54d9e22 100644
> --- a/Documentation/cgroup-v2.txt
> +++ b/Documentation/cgroup-v2.txt
> @@ -1514,6 +1514,28 @@ Cpuset Interface Files
> it is a subset of "cpuset.mems". Its value will be affected
> by memory nodes hotplug events.
>
> + cpuset.sched.domain
> + A read-write single value file which exists on non-root
> + cpuset-enabled cgroups. It is a binary value flag that accepts
> + either "0" (off) or a non-zero value (on).

I would be conservative and only allow 0/1.

> This flag is set
> + by the parent and is not delegatable.
> +
> + If set, it indicates that the CPUs in the current cgroup will
> + be the root of a scheduling domain. The root cgroup is always
> + a scheduling domain. There are constraints on where this flag
> + can be set. It can only be set in a cgroup if all the following
> + conditions are true.
> +
> + 1) The parent cgroup is also a scheduling domain with a non-empty
> + cpu list.

Ah, so initially I was confused by the requirement for root to have it
always set, but you'll allow child domains to steal _all_ CPUs, such
that root ends up with an empty effective set?

What about the (kernel) threads that cannot be moved out of the root
group?

> + 2) The list of CPUs are exclusive, i.e. they are not shared by
> + any of its siblings.

Right.

> + 3) There is no child cgroups with cpuset enabled.
> +
> + Setting this flag will take the CPUs away from the effective
> + CPUs of the parent cgroup. Once it is set, this flag cannot be
> + cleared if there are any child cgroups with cpuset enabled.

This I'm not clear on. Why?