[PATCH 1/6] x86/intel_rdt/mba_sc: Add documentation for MBA software controller

From: Vikas Shivappa
Date: Thu Mar 29 2018 - 18:29:37 EST


Add documentation about usage which includes the "schemata" format and
use case for MBA software controller.

Signed-off-by: Vikas Shivappa <vikas.shivappa@xxxxxxxxxxxxxxx>
---
Documentation/x86/intel_rdt_ui.txt | 63 ++++++++++++++++++++++++++++++++++++++
1 file changed, 63 insertions(+)

diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
index 71c3098..3b9634e 100644
--- a/Documentation/x86/intel_rdt_ui.txt
+++ b/Documentation/x86/intel_rdt_ui.txt
@@ -315,6 +315,60 @@ Memory b/w domain is L3 cache.

MB:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...

+Memory bandwidth(b/w) in MegaBytes
+----------------------------------
+
+Memory bandwidth is a core specific mechanism which means that when the
+Memory b/w percentage is specified in the schemata per package it
+actually is applied on a per core basis via IA32_MBA_THRTL_MSR
+interface. This may lead to confusion in scenarios below:
+
+1. User may not see increase in actual b/w when percentage values are
+ increased:
+
+This can occur when aggregate L2 external b/w is more than L3 external
+b/w. Consider an SKL SKU with 24 cores on a package and where L2
+external b/w is 10GBps (hence aggregate L2 external b/w is 240GBps) and
+L3 external b/w is 100GBps. Now a workload with '20 threads, having 50%
+b/w, each consuming 5GBps' consumes the max L3 b/w of 100GBps although
+the percentage value specified is only 50% << 100%. Hence increasing
+the b/w percentage will not yeild any more b/w. This is because
+although the L2 external b/w still has capacity, the L3 external b/w
+is fully used. Also note that this would be dependent on number of
+cores the benchmark is run on.
+
+2. Same b/w percentage may mean different actual b/w depending on # of
+ threads:
+
+For the same SKU in #1, a 'single thread, with 10% b/w' and '4 thread,
+with 10% b/w' can consume upto 10GBps and 40GBps although they have same
+percentage b/w of 10%. This is simply because as threads start using
+more cores in an rdtgroup, the actual b/w may increase or vary although
+user specified b/w percentage is same.
+
+In order to mitigate this and make the interface more user friendly, we
+can let the user specify the max bandwidth per rdtgroup in bytes(or mega
+bytes). The kernel underneath would use a software feedback mechanism or
+a "Software Controller" which reads the actual b/w using MBM counters
+and adjust the memowy bandwidth percentages to ensure the "actual b/w
+< user b/w".
+
+The legacy behaviour is default and user can switch to the "MBA software
+controller" mode using a mount option 'mba_MB'.
+
+To use the feature mount the file system using mba_MB option:
+
+# mount -t resctrl resctrl [-o cdp[,cdpl2][mba_MB]] /sys/fs/resctrl
+
+The schemata format is below:
+
+Memory b/w Allocation in Megabytes
+----------------------------------
+
+Memory b/w domain is L3 cache.
+
+ MB:<cache_id0>=bw_MB0;<cache_id1>=bw_MB1;...
+
Reading/writing the schemata file
---------------------------------
Reading the schemata file will show the state of all resources
@@ -358,6 +412,15 @@ allocations can overlap or not. The allocations specifies the maximum
b/w that the group may be able to use and the system admin can configure
the b/w accordingly.

+If the MBA is specified in MB(megabytes) then user can enter the max b/w in MB
+rather than the percentage values.
+
+# echo "L3:0=3;1=c\nMB:0=1024;1=500" > /sys/fs/resctrl/p0/schemata
+# echo "L3:0=3;1=3\nMB:0=1024;1=500" > /sys/fs/resctrl/p1/schemata
+
+In the above example the tasks in "p1" and "p0" on socket 0 would use a max b/w
+of 1024MB where as on socket 1 they would use 500MB.
+
Example 2
---------
Again two sockets, but this time with a more realistic 20-bit mask.
--
1.9.1