[PATCH v3 1/2] Documentation: x86: Rename resctrl_ui.rst and add two errata to the file

From: Fenghua Yu
Date: Tue Oct 13 2020 - 20:49:40 EST


Intel Memory Bandwidth Monitoring (MBM) counters may report system
memory bandwidth incorrectly on some Intel processors. The errata are
reported in erratum SKX99 [1], erratum BDF102 [2] and RDT reference
manual [3].

To work around the errata, MBM total and local readings are corrected
using a correction factor table.

Since the correction factor table is not publicly documented anywhere,
the table and the errata are documented in Documentation/x86/resctrl.rst
for future reference. The resctrl.rst file is renamed from
Documentation/x86/resctrl_ui.rst because the file won't contain user
interface only anymore.

1. Erratum SKX99 in Intel Xeon Processor Scalable Family Specification
Update:
http://web.archive.org/web/20200716124958/https://www.intel.com/content/www/us/en/processors/xeon/scalable/xeon-scalable-spec-update.html
2. Erratum BDF102 in Intel Xeon E5-2600 v4 Processor Product Family
Specification Update:
http://web.archive.org/web/20191125200531/https://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v4-spec-update.pdf
3. The errata in Intel Resource Director Technology (Intel RDT) on 2nd
Generation Intel Xeon Scalable Processors Reference Manual:
https://software.intel.com/content/www/us/en/develop/articles/intel-resource-director-technology-rdt-reference-manual.html

Suggested-by: Borislav Petkov <bp@xxxxxxxxx>
Signed-off-by: Fenghua Yu <fenghua.yu@xxxxxxxxx>
Reviewed-by: Tony Luck <tony.luck@xxxxxxxxx>
---
Change Log:
v3:
- Remove unnecessary conf.py change in patch 1 (Randy).

v2:
- Document the correction factor table and errata in resctrl.rst (Boris).
- Change the documentation URLs to stable archive.org (Tony).

Documentation/x86/index.rst | 2 +-
.../x86/{resctrl_ui.rst => resctrl.rst} | 82 +++++++++++++++++++
2 files changed, 83 insertions(+), 1 deletion(-)
rename Documentation/x86/{resctrl_ui.rst => resctrl.rst} (92%)

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 265d9e9a093b..49d2fd9f0e5b 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -25,7 +25,7 @@ x86-specific Documentation
pti
mds
microcode
- resctrl_ui
+ resctrl
tsx_async_abort
usb-legacy-support
i386/index
diff --git a/Documentation/x86/resctrl_ui.rst b/Documentation/x86/resctrl.rst
similarity index 92%
rename from Documentation/x86/resctrl_ui.rst
rename to Documentation/x86/resctrl.rst
index e59b7b93a9b4..8b8ca6de5e1f 100644
--- a/Documentation/x86/resctrl_ui.rst
+++ b/Documentation/x86/resctrl.rst
@@ -1209,3 +1209,85 @@ View the llc occupancy snapshot::

# cat /sys/fs/resctrl/p1/mon_data/mon_L3_00/llc_occupancy
11234000
+
+Intel RDT Errata
+================
+
+Intel MBM Counters May Report System Memory Bandwidth Incorrectly
+-----------------------------------------------------------------
+
+Errata SKX99 for Skylake server and BDF102 for Broadwell server.
+
+Problem: Intel Memory Bandwidth Monitoring (MBM) counters track metrics
+according to the assigned Resource Monitor ID (RMID) for that logical core.
+The IA32_QM_CTR register(MSR 0xC8E), used to report these metrics, may
+report incorrect system bandwidth for certain RMID values.
+
+Implication: Due to the errata, system memory bandwidth may not match
+what is reported.
+
+Workaround: The kernel works around the errata.
+
+MBM total and local readings are corrected by the following correction
+factor table for the errata:
+
++---------------+---------------+---------------+-----------------+
+|core count |rmid count |rmid threshold |correction factor|
++---------------+---------------+---------------+-----------------+
+|1 |8 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|2 |16 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|3 |24 |15 |0.969650 |
++---------------+---------------+---------------+-----------------+
+|4 |32 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|6 |48 |31 |0.969650 |
++---------------+---------------+---------------+-----------------+
+|7 |56 |47 |1.142857 |
++---------------+---------------+---------------+-----------------+
+|8 |64 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|9 |72 |63 |1.185115 |
++---------------+---------------+---------------+-----------------+
+|10 |80 |63 |1.066553 |
++---------------+---------------+---------------+-----------------+
+|11 |88 |79 |1.454545 |
++---------------+---------------+---------------+-----------------+
+|12 |96 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|13 |104 |95 |1.230769 |
++---------------+---------------+---------------+-----------------+
+|14 |112 |95 |1.142857 |
++---------------+---------------+---------------+-----------------+
+|15 |120 |95 |1.066667 |
++---------------+---------------+---------------+-----------------+
+|16 |128 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|17 |136 |127 |1.254863 |
++---------------+---------------+---------------+-----------------+
+|18 |144 |127 |1.185255 |
++---------------+---------------+---------------+-----------------+
+|19 |152 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|20 |160 |127 |1.066667 |
++---------------+---------------+---------------+-----------------+
+|21 |168 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|22 |176 |159 |1.454334 |
++---------------+---------------+---------------+-----------------+
+|23 |184 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|24 |192 |127 |0.969744 |
++---------------+---------------+---------------+-----------------+
+|25 |200 |191 |1.280246 |
++---------------+---------------+---------------+-----------------+
+|26 |208 |191 |1.230921 |
++---------------+---------------+---------------+-----------------+
+|27 |216 |0 |1.000000 |
++---------------+---------------+---------------+-----------------+
+|28 |224 |191 |1.143118 |
++---------------+---------------+---------------+-----------------+
+
+If rmid > rmid threshold, MBM total and local values should be multiplied
+by the correction factor.
--
2.28.0