Re: [PATCH v7 5/6] cgroup/cpuset: Update description of cpuset.cpus.partition in cgroup-v2.rst

From: Waiman Long
Date: Wed Oct 13 2021 - 18:11:57 EST


On 10/13/21 5:45 PM, Waiman Long wrote:



In conclusion, it'd be good to have validity conditions separate from
transition conditions (since hotplug transition can't be rejected) and
perhaps treat administrative changes from an ancestor equally as a
hotplug.

I am trying to make the result of changing "cpuset.cpus" as close to hotplug as possible but there are cases where the "cpuset.cpus" change is prohibited but hotplug can still happen to remove the cpu.

Hope this will help to clarify the current design.

BTW, the attached file is the current draft of cpuset.cpus.partition document.

Cheers,
Longman

cpuset.cpus.partition
A read-write single value file which exists on non-root
cpuset-enabled cgroups. This flag is owned by the parent cgroup
and is not delegatable.

It accepts only the following input values when written to.

======== ================================
"member" Non-root member of a partition
"root" Partition root
"isolated" Partition root without load balancing
======== ================================

When set to be a partition root, the current cgroup is the
root of a new partition or scheduling domain that comprises
itself and all its descendants except those that are separate
partition roots themselves and their descendants. The root
cgroup is always a partition root.

When set to "isolated", the CPUs in that partition root will
be in an isolated state without any load balancing from the
scheduler. Tasks in such a partition must be explicitly bound
to each individual CPU.

"cpuset.cpus" must always be set up first before enabling
partition. Unlike "member" whose "cpuset.cpus.effective" can
contain CPUs not in "cpuset.cpus", this can never happen with a
valid partition root. In other words, "cpuset.cpus.effective"
is always a subset of "cpuset.cpus" for a valid partition root.

When a parent partition root cannot exclusively grant any of
the CPUs specified in "cpuset.cpus", "cpuset.cpus.effective"
becomes empty. If there are tasks in the partition root, the
partition root becomes invalid and "cpuset.cpus.effective"
is reset to that of the nearest non-empty ancestor.

Note that a task cannot be moved to a cgroup with empty
"cpuset.cpus.effective".

There are additional constraints on where a partition root can
be enabled ("root" or "isolated"). It can only be enabled in
a cgroup if all the following conditions are met.

1) The "cpuset.cpus" is non-empty and exclusive, i.e. they are
not shared by any of its siblings.
2) The parent cgroup is a valid partition root.
3) The "cpuset.cpus" is a subset of parent's "cpuset.cpus".
4) There is no child cgroups with cpuset enabled. This avoids
cpu migrations of multiple cgroups simultaneously which can
be problematic.

On read, the "cpuset.cpus.partition" file can show the following
values.

====================== ==============================
"member" Non-root member of a partition
"root" Partition root
"isolated" Partition root without load balancing
"root invalid (<reason>)" Invalid partition root
====================== ==============================

In the case of an invalid partition root, a descriptive string on
why the partition is invalid is included within parentheses.

Once becoming a partition root, changes to "cpuset.cpus" is
generally allowed as long as the cpu list is exclusive and is
a superset of children's cpu lists.

The constraints of a valid partition root are as follows:

1) "cpuset.cpus" is non-empty and exclusive.
2) The parent cgroup is a valid partition root.
3) "cpuset.cpus.effective" is a subset of "cpuset.cpus"
4) "cpuset.cpus.effective" is non-empty when there are tasks
in the partition.

Changes to "cpuset.cpus" or cpu hotplug may cause the state
of a valid partition root to become invalid when one or more
constraints of a valid partition root are violated. Therefore,
user space agents that manage partition roots should avoid
unnecessary changes to "cpuset.cpus" and always check the state
of "cpuset.cpus.partition" after making changes to make sure
that the partitions are functioning properly as expected.

Changing a partition root to "member" is always allowed.
If there are child partition roots underneath it, however,
they will be forced to be switched back to "member" too and
lose their partitions. So care must be taken to double check
for this condition before disabling a partition root.

Setting a cgroup to a valid partition root will take the CPUs
away from the effective CPUs of the parent partition.

A valid parent partition may distribute out all its CPUs to
its child partitions as long as it is not the root cgroup as
we need some house-keeping CPUs in the root cgroup.

An invalid partition is not a real partition even though some
internal states may still be kept.

An invalid partition root can be reverted back to a real
partition root if none of the constraints of a valid partition
root are violated.

Poll and inotify events are triggered whenever the state of
"cpuset.cpus.partition" changes. That includes changes caused by
write to "cpuset.cpus.partition", cpu hotplug and other changes
that make the partition invalid. This will allow user space
agents to monitor unexpected changes to "cpuset.cpus.partition"
without the need to do continuous polling.