[PATCH v7 06/11] sched: document the cpu cgroup.

From: Glauber Costa
Date: Wed May 29 2013 - 07:03:39 EST


The CPU cgroup is so far, undocumented. Although data exists in the
Documentation directory about its functioning, it is usually spread,
and/or presented in the context of something else. This file
consolidates all cgroup-related information about it.

Signed-off-by: Glauber Costa <glommer@xxxxxxxxxx>
---
Documentation/cgroups/cpu.txt | 81 +++++++++++++++++++++++++++++++++++++++++++
1 file changed, 81 insertions(+)
create mode 100644 Documentation/cgroups/cpu.txt

diff --git a/Documentation/cgroups/cpu.txt b/Documentation/cgroups/cpu.txt
new file mode 100644
index 0000000..072fd58
--- /dev/null
+++ b/Documentation/cgroups/cpu.txt
@@ -0,0 +1,81 @@
+CPU Controller
+--------------
+
+The CPU controller is responsible for grouping tasks together that will be
+viewed by the scheduler as a single unit. The CFS scheduler will first divide
+CPU time equally between all entities in the same level, and then proceed by
+doing the same in the next level. Basic use cases for that are described in the
+main cgroup documentation file, cgroups.txt.
+
+Users of this functionality should be aware that deep hierarchies will of
+course impose scheduler overhead, since the scheduler will have to take extra
+steps and look up additional data structures to make its final decision.
+
+Through the CPU controller, the scheduler is also able to cap the CPU
+utilization of a particular group. This is particularly useful in environments
+in which CPU is paid for by the hour, and one values predictability over
+performance.
+
+CPU Accounting
+--------------
+
+The CPU cgroup will also provide additional files under the prefix "cpuacct".
+Those files provide accounting statistics and were previously provided by the
+separate cpuacct controller. Although the cpuacct controller will still be kept
+around for compatibility reasons, its usage is discouraged. If both the CPU and
+cpuacct controllers are present in the system, distributors are encouraged to
+always mount them together.
+
+Files
+-----
+
+The CPU controller exposes the following files to the user:
+
+ - cpu.shares: The weight of each group living in the same hierarchy, that
+ translates into the amount of CPU it is expected to get. Upon cgroup creation,
+ each group gets assigned a default of 1024. The percentage of CPU assigned to
+ the cgroup is the value of shares divided by the sum of all shares in all
+ cgroups in the same level.
+
+ - cpu.cfs_period_us: The duration in microseconds of each scheduler period, for
+ bandwidth decisions. This defaults to 100000us or 100ms. Larger periods will
+ improve throughput at the expense of latency, since the scheduler will be able
+ to sustain a cpu-bound workload for longer. The opposite of true for smaller
+ periods. Note that this only affects non-RT tasks that are scheduled by the
+ CFS scheduler.
+
+- cpu.cfs_quota_us: The maximum time in microseconds during each cfs_period_us
+ in for the current group will be allowed to run. For instance, if it is set to
+ half of cpu_period_us, the cgroup will only be able to peak run for 50 % of
+ the time. One should note that this represents aggregate time over all CPUs
+ in the system. Therefore, in order to allow full usage of two CPUs, for
+ instance, one should set this value to twice the value of cfs_period_us.
+
+- cpu.stat: statistics about the bandwidth controls. No data will be presented
+ if cpu.cfs_quota_us is not set. The file presents three
+ numbers:
+ nr_periods: how many full periods have been elapsed.
+ nr_throttled: number of times we exausted the full allowed bandwidth
+ throttled_time: total time the tasks were not run due to being overquota
+
+ - cpu.rt_runtime_us and cpu.rt_period_us: Those files are the RT-tasks
+ analogous to the CFS files cfs_quota_us and cfs_period_us. One important
+ difference, though, is that while the cfs quotas are upper bounds that
+ won't necessarily be met, the rt runtimes form a stricter guarantee.
+ Therefore, no overlap is allowed. Implications of that are that given a
+ hierarchy with multiple children, the sum of all rt_runtime_us may not exceed
+ the runtime of the parent. Also, a rt_runtime_us of 0, means that no rt tasks
+ can ever be run in this cgroup. For more information about rt tasks runtime
+ assignments, see scheduler/sched-rt-group.txt
+
+ - cpuacct.usage: The aggregate CPU time, in nanoseconds, consumed by all tasks
+ in this group.
+
+ - cpuacct.usage_percpu: The CPU time, in nanoseconds, consumed by all tasks in
+ this group, separated by CPU. The format is an space-separated array of time
+ values, one for each present CPU.
+
+ - cpuacct.stat: aggregate user and system time consumed by tasks in this group.
+ The format is
+ user: x
+ system: y
--
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/