[PATCH 1/8] Documentation, x86: Documentation for Intel Mem b/w allocation

From: Vikas Shivappa
Date: Fri Feb 17 2017 - 15:00:29 EST

Update the intel_rdt_ui documentation to have Memory bandwidth(b/w)
allocation interface usage.

Signed-off-by: Vikas Shivappa <vikas.shivappa@xxxxxxxxxxxxxxx>
Documentation/x86/intel_rdt_ui.txt | 74 ++++++++++++++++++++++++++++++++++----
1 file changed, 67 insertions(+), 7 deletions(-)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index d918d26..2f679e2 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -4,6 +4,7 @@ Copyright (C) 2016 Intel Corporation

Fenghua Yu <fenghua.yu@xxxxxxxxx>
Tony Luck <tony.luck@xxxxxxxxx>
+Vikas Shivappa <vikas.shivappa@xxxxxxxxx>

This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
@@ -22,8 +23,8 @@ Info directory

The 'info' directory contains information about the enabled
resources. Each resource has its own subdirectory. The subdirectory
-names reflect the resource names. Each subdirectory contains the
-following files:
+names reflect the resource names.
+Cache resource(L3/L2) subdirectory contains the following files:

"num_closids": The number of CLOSIDs which are valid for this
resource. The kernel uses the smallest number of
@@ -35,6 +36,16 @@ following files:
"min_cbm_bits": The minimum number of consecutive bits which must be
set when writing a mask.

+Memory bandwitdh(MB) subdirectory contains the following files:
+"min_bw": The minimum memory bandwidth percentage which user can
+ request.
+"bw_gran": The granularity in which the user can request the memory
+ bandwidth percentage.
+"scale_linear":Indicates if the bandwidth scale is linear or
+ non-linear.

Resource groups
@@ -107,6 +118,28 @@ and 0xA are not. On a system with a 20-bit mask each bit represents 5%
of the capacity of the cache. You could partition the cache into four
equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.

+Memory bandwidth(b/w) throttle
+For Memory b/w resource, user controls the resource by indicating the
+percentage of total memory b/w.
+The minimum bandwidth percentage value for each cpu model is predefined
+and can be looked up through "info/MB/min_bw". The bandwidth granularity
+that can be requested is also dependent on the cpu model and can be
+looked up at "info/MB/bw_gran".
+The bandwidth percentage values are mapped to hardware delay values and
+programmed in the QOS_MSRs. The delay values may be in linear scale and
+non-linear scale. In a linear scale the programmed values directly
+correspond to a delay value(b/w percentage = 100 - delay). However in a
+non-linear scale, the percentage values correspond to a pre-caliberated
+delay values. The delay values in non-linear scale have the granularity
+of power of 2.
+The bandwidth throttling is a a core specific mechanism on some of Intel
+SKUs. Using a high bandwidth and a low bandwidth setting on two threads
+sharing a core will result in both threads being throttled to use the
+low bandwidth.

L3 details (code and data prioritization disabled)
@@ -129,16 +162,24 @@ schemata format is always:


+Memory b/w Allocation details
+Memory b/w domain is L3 cache.
+ MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
Example 1
On a two socket machine (one L3 cache per socket) with just four bits
-for cache bit masks
+for cache bit masks, minimum b/w of 10% with a memory bandwidth
+granularity of 10%

# mount -t resctrl resctrl /sys/fs/resctrl
# cd /sys/fs/resctrl
# mkdir p0 p1
-# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
-# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
+# echo "L3:0=3;1=c\nMB:0=50;1=50" > /sys/fs/resctrl/p0/schemata
+# echo "L3:0=3;1=3\nMB:0=50;1=50" > /sys/fs/resctrl/p1/schemata

The default resource group is unmodified, so we have access to all parts
of all caches (its schemata file reads "L3:0=f;1=f").
@@ -147,6 +188,14 @@ Tasks that are under the control of group "p0" may only allocate from the
"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
Tasks in group "p1" use the "lower" 50% of cache on both sockets.

+Similarly, tasks that are under the control of group "p0" may use a
+maximum memory b/w of 50% on socket0, and the 50% on socket 1.
+Tasks in group "p1" may use the rest of 50% memory b/w on both sockets.
+Note that unlike cache masks, memory b/w cannot specify whether these
+allocations can overlap or not. The allocations specifies the maximum
+b/w that the group may be able to use and the system admin can configure
+the b/w accordingly.
Example 2
Again two sockets, but this time with a more realistic 20-bit mask.
@@ -185,6 +234,16 @@ Ditto for the second real time task (with the remaining 25% of cache):
# echo 5678 > p1/tasks
# taskset -cp 2 5678

+For the same 2 socket system with memory b/w resource and CAT L3 the
+schemata would look like:
+Assume min_bw 10 and bw_gran is 10.
+# echo -e "L3:0=f8000;1=fffff\nMB:0=10;1=30" > p0/schemata
+This would request 10% memory b/w on socket 0 and 30% memory b/w on
Example 3

@@ -203,10 +262,11 @@ First we reset the schemata for the default group so that the "upper"
# echo "L3:0=3ff" > schemata

Next we make a resource group for our real time cores and give
-it access to the "top" 50% of the cache on socket 0.
+it access to the "top" 50% of the cache on socket 0 and 50% of memory
+bandwidth on socket 0.

# mkdir p0
-# echo "L3:0=ffc00;" > p0/schemata
+# echo "L3:0=ffc00;\nMB:0=50" > p0/schemata

Finally we move core 4-7 over to the new group and make sure that the
kernel and the tasks running there get 50% of the cache.