cgroups(7): documenting cgroups v2 thread mode

From: Michael Kerrisk (man-pages)
Date: Tue Jan 02 2018 - 13:24:17 EST


Hello Tejun and all,

To date, the cgroups(7) manual page does not document thread mode
(added in Linux 4.14). Furthermore, the documentation in
Documentation/cgroup-v2.txt is, I think, a little thin.

I have attempted to address this by adding some extensive documentation
to the cgroups(7) manual page. This text is based on some reading
of Documentation/cgroup-v2.txt, reading of the kernel source, and
quite a lot of experimentation.

The plain-text version for (easy review) is shown below. I would be
happy to receive review comments/corrections/improvements on the text below.

In particular, Tejun and Peter, I would be very happy if you could
take some time to look at this text.

The branch containing the pending cgroups(7) changes can be found at:
https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_cgroup_updates

[[
CGROUPS V2 THREAD MODE
Among the restrictions imposed by cgroups v2 that were not
present in cgroups v1 are the following:

* No thread-granularity control: all of the threads of a
process must be in the same cgroup.

* No internal processes: a cgroup can't both have member proâ
cesses and exercise controllers on child cgroups.

Both of these restrictions were added because the lack of these
restrictions had caused problems in cgroups v1. In particular,
the cgroups v1 ability to allow thread-level granularity for
cgroup membership made no sense for some controllers. (A
notable example was the memory controller: since threads share
an address space, it made no sense to split threads across difâ
ferent memory cgroups.)

Notwithstanding the initial design decision in cgroups v2,
there were use cases for certain controllers, notably the cpu
controller, for which thread-level granularity of control was
meaningful and useful. To accommodate such use cases, Linux
4.14 added thread mode for cgroups v2.

Thread mode allows the following:

* The creation of threaded subtrees in which the threads of a
process may be spread across cgroups inside the tree. (A
threaded subtree may contain multiple multithreaded proâ
cesses.)

* The concept of threaded controllers, which can distribute
resources across the cgroups in a threaded subtree.

* A relaxation of the "no internal processes rule", so that,
within a threaded subtree, a cgroup can both contain member
threads and exercise resource control over child cgroups.

With the addition of thread mode, each nonroot cgroup now conâ
tains a new file, cgroup.type, that exposes, and in some cirâ
cumstances can be used to change, the "type" of a cgroup. This
file contains one of the following type values:

domain This is a normal v2 cgroup that provides process-granuâ
larity control. If a process is a member of this
cgroup, then all threads of the process are (by definiâ
tion) in the same cgroup. This is the default cgroup
type, and provides the same behavior that was provided
for cgroups in the initial cgroups v2 implementation.

threaded
This cgroup is a member of a threaded subtree. Threads
can be added to this cgroup, and controllers can be
enabled for the cgroup.

domain threaded
This is a domain cgroup that serves as the root of a
threaded subtree. This cgroup type is also known as
"threaded root".

domain invalid
This is a cgroup inside a threaded subtree that is in an
"invalid" state. Processes can't be added to the
cgroup, and controllers can't be enabled for the cgroup.
The only thing that can be done with this cgroup (other
than deleting it) is to convert it to a threaded cgroup
by writing the string "threaded" to the cgroup.type
file.

Threaded versus domain controllers
With the addition of threads mode, cgroups v2 now distinguishes
two types of resource controllers:

* Threaded controllers: these controllers support thread-granâ
ularity for resource control and can be enabled inside
threaded subtrees, with the result that the corresponding
controller-interface files appear inside the cgroups in the
threaded subtree. As at Linux 4.15, the following conâ
trollers are threaded: cpu, perf_event, and pids.

* Domain controllers: these controllers support only process
granularity for resource control. From the perspective of a
domain controller, all threads of a process are always in
the same cgroup. Domain controllers can't be enabled inside
a threaded subtree.

Creating a threaded subtree
There are two pathways that lead to the creation of a threaded
subtree. The first pathway proceeds as follows:

1. We write the string "threaded" to the cgroup.type file of a
cgroup y/z that currently has the type domain. This has the
following effects:

* The type of the cgroup y/z becomes threaded.

* The type of the parent cgroup, y, becomes domain
threaded. The parent cgroup is the root of a threaded
subtree (also known as the "threaded root").

* All other cgroups under y that were not already of type
threaded (because they were inside already existing
threaded subtrees under the new threaded root) are conâ
verted to type domain invalid. Any subsequently created
cgroups under y will also have the type domain invalid.

2. We write the string "threaded" to each of the domain invalid
cgroups under y, in order to convert them to the type
threaded. As a consequence of this step, all threads under
the threaded root now have the type threaded and the
threaded subtree is now fully usable. The requirement to
write "threaded" to each of these cgroups is somewhat cumâ
bersome, but allows for possible future extensions to the
thread-mode model.

âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
âFIXME â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ
âRe the preceding paragraphs... Are there other reaâ â
âsosn for the (cumbersome) requirement to write â
â'threaded' to each of the cgroup.type files in the â
âthreaded subtrees? Tejun Heo mentioned the followâ â
âing: â
â â
â Consistency w/ the cgroups right under the root â
â cgroup. Because they can be both domains and â
â threadroots, we can't switch the children over â
â to thread mode automatically. Doing that for â
â cgroups further down in the hierarchy would be â
â really inconsistent. â
â â
âBut, it's not clear to me how "Doing that for â
âcgroups further down in the hierarchy would be â
âreally inconsistent", since in the current implemenâ â
âtation, those same thread groups are converted to â
â"domain invalid" type. What am I missing? â
âââââââââââââââââââââââââââââââââââââââââââââââââââââââ

The second way of creating a threaded subtree is as follows:

1. In an existing cgroup, z, that currently has the type
domain, we (1) enable one or more threaded controllers and
(2) make a process a member of z. (These two steps can be
done in either order.) This has the following consequences:

* The type of z becomes domain threaded.

* All of the descendant cgroups of x that are were not
already of type threaded are converted to type domain
invalid.

2. As before, we make the threaded subtree usable by writing
the string "threaded" to each of the domain invalid cgroups
under y, in order to convert them to the type threaded.

One of the consequences of the above pathways to creating a
threaded subtree is that the threaded root cgroup can be a parâ
ent only to threaded (and domain invalid) cgroups. The
threaded root cgroup can't be a parent of a domain cgroups, and
a threaded cgroup can't have a sibling that is a domain cgroup.

Using a threaded subtree
Within a threaded subtree, threaded controllers can be enabled
in each subgroup whose type has been changed to threaded; upon
doing so, the corresponding controller interface files appear
in the children of that cgroup.

A process can be moved into a threaded subtree by writing its
PID to the cgroup.procs file in one of the cgroups inside the
tree. This has the effect of making all of the threads in the
process members of the corresponding cgroup and makes the
process a member of the threaded subtree. The threads of the
process can then be spread across the threaded subtree by writâ
ing their thread IDs (see gettid(2)) to the cgroup.threads
files in different cgroups inside the subtree. The threads of
a process must all reside in the same threaded subtree.

The cgroup.threads file is present in each cgroup (including
domain cgroups) and can be read in order to discover the set of
threads that is present in the cgroup. The set of thread IDs
obtained when reading this file is not guaranteed to be ordered
or free of duplicates.

The cgroup.procs file in the threaded root shows the PIDs of
all processes that are members of the threaded subtree. The
cgroup.procs files in the other cgroups in the subtree are not
readable.

Domain controllers can't be enabled in a threaded subtree; no
controller-interface files appear inside the cgroups underneath
the threaded root. From the point of view of a domain conâ
troller, threaded subtrees are invisible: a multithreaded
process inside a threaded subtree appears to a domain conâ
troller as a process that resides in the threaded root cgroup.

Within a threaded subtree, the "no internal processes" rule
does not apply: a cgroup can both contain member processes (or
thread) and exercise controllers on child cgroups.

Rules for writing to cgroup.type and creating threaded subtrees
A number of rules apply when writing to the cgroup.type file:

* Only the string "threaded" may be written. In other words,
the only explicit transition that is possible is to convert
a domain cgroup to type threaded.

* The string "threaded" can be written only if the current
value in cgroup.type is one of the following

 domain, to start the creation of a threaded subtree via
the first of the pathways described above;

 domain invalid, to convert one of the cgroups in a
threaded subtree into a usable (i.e., threaded) state;

 threaded, which has no effect (a "no-op").

* We can't write to a cgroup.type file if the parent's type is
domain invalid. In other words, the cgroups of a threaded
subtree must be converted to the threaded state in a top-
down manner.

There are also various constraints that must be satisfied in
order to create a threaded subtree rooted at the cgroup x:

* There can be no member processes in the descendant cgroups
of x. (The cgroup x can itself have member processes.)

* No domain controllers may be enabled in x's cgroup.subâ
tree_control file.

* The existing cgroups inside the threaded subtree must either
be of type domain or part of (unpopulated) threaded subâ
trees.

If any of the above constraints is violated, then an attempt to
write "threaded" to a cgroup.type file fails with the error
ENOTSUP.

The "domain threaded" cgroup type
According to the pathways described above, the type of a cgroup
can change to domain threaded in either of the following cases:

* The string "threaded" is written to a child cgroup.

* A threaded controller is enabled inside the cgroup and a
process is made a member of the cgroup.

A domain threaded cgroup, x, can revert to the type domain if
the above conditions no longer hold trueâthat is, if all
threaded child cgroups of x are removed and either x no longer
has threaded controllers enabled or no longer has member proâ
cesses.

When a domain threaded cgroup x reverts to the type domain:

* All domain invalid descendants of x that are not in lower-
level threaded subtrees revert to the type domain.

* The root cgroups in any lower-level threaded subtrees revert
to the type domain threaded.

Exceptions for the root cgroup
The root cgroup of the v2 hierarchy is treated exceptionally:
it can be the parent of both domain and threaded cgroups. If
the string "threaded" is written to the cgroup.type file of one
of the children of the root cgroup, then

* The type of that cgroup becomes threaded.

* The type of any descendants of that cgroup that are not part
of lower-level threaded subtrees changes to domain invalid.

Note that in this case, there is no cgroup whose type becomes
domain threaded. (Notionally, the root cgroup can be considâ
ered as the threaded root for the cgroup whose type was changed
to threaded.)

The aim of this exceptional treatment for the root cgroup is to
allow a threaded cgroup that employs the cpu controller to be
placed as high as possible in the hierarchy, so as to minimize
the (small) cost of traversing the cgroup hierarchy.

The cgroups v2 "cpu" controller and realtime processes
As at Linux 4.15, the cgroups v2 cpu controller does not supâ
port control of realtime processes, and the controller can be
enabled in the root cgroup only if all realtime threads are in
the root cgroup. (If there are realtime processes in nonroot
cgroups, then a write(2) of the string "+cpu" to the
cgroup.subtree_control file fails with the error EINVAL. Howâ
ever, on some systems, systemd(1) places certain realtime proâ
cesses in nonroot cgroups in the v2 hierarchy. On such sysâ
tems, these processes must first be moved to the root cgroup
before the cpu controller can be enabled.
]]

Cheers,

Michael

--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/