[PATCH v10 0/8] Add support for Sub-NUMA cluster (SNC) systems

From: Tony Luck
Date: Tue Oct 31 2023 - 17:17:20 EST


The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
that share an L3 cache into two or more sets. This plays havoc with the
Resource Director Technology (RDT) monitoring features. Prior to this
patch Intel has advised that SNC and RDT are incompatible.

Some of these CPU support an MSR that can partition the RMID counters in
the same way. This allows monitoring features to be used. With the caveat
that users must be aware that Linux may migrate tasks more frequently
between SNC nodes than between "regular" NUMA nodes, so reading counters
from all SNC nodes may be needed to get a complete picture of activity
for tasks.

Cache and memory bandwidth allocation features continue to operate at
the scope of the L3 cache.

Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>

---

Dropped Peter's "Reviewed-by" from all but parts 5 & 8 since there
have been many changes since he provided those.

Other changes since v9 (all from Reinette's comments)

global s/cpu/CPU/ in commit messages and code comments

#1
New test for invalid domain id before calling rdt_find_domain() means that
error handling in that function and at all call-sites can be simplified.
In pseudo_lock_region_init() use the new enum resctrl_scope for local variable.

#2
Include *all* common fields in the rdt_domain_hdr. Defer adding "type" until it is
used later in part #3.

#3
Fix commit to be specific the only the RDT_RESOURCE_L3 resource is going
to have different monitor and control scope.
Rename get_domain_from_cpu() -> get_ctrl_domain_from_cpu()
Rewrite comment for rdt_find_domains().
Add "type" field to rdt_domain_hdr structure.
Delete the /* RDT_RESOURCE_MBA is never mon_capable */ comment.

#4
Comment against patch 4, but now fixed in patch #2. cpu_mask
is included in common header.

#5
No comments. No changes.

#6
Fixed missing word s/monitoring on Intel/monitoring on an Intel/
Deleted "A later patch" paragraph.
Expanded description how how values are "adjusted" for mon_scale
and cache size.
Changed type of "snc_nodes_per_l3_cache" to "unsigned int".

#7
Expand h/w to hardware (commit and code comments)
Remove "earlier commit" reference
s/counnter/counter/
Check for offline CPUs and warn user SNC detection may be broken.

#8
No comments. No changes.

Tony Luck (8):
x86/resctrl: Prepare for new domain scope
x86/resctrl: Prepare to split rdt_domain structure
x86/resctrl: Prepare for different scope for control/monitor
operations
x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
x86/resctrl: Add node-scope to the options for feature scope
x86/resctrl: Introduce snc_nodes_per_l3_cache
x86/resctrl: Sub NUMA Cluster detection and enable
x86/resctrl: Update documentation with Sub-NUMA cluster changes

Documentation/arch/x86/resctrl.rst | 23 +-
include/linux/resctrl.h | 87 +++--
arch/x86/include/asm/msr-index.h | 1 +
arch/x86/kernel/cpu/resctrl/internal.h | 66 ++--
arch/x86/kernel/cpu/resctrl/core.c | 411 +++++++++++++++++-----
arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 58 +--
arch/x86/kernel/cpu/resctrl/monitor.c | 68 ++--
arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 26 +-
arch/x86/kernel/cpu/resctrl/rdtgroup.c | 149 ++++----
9 files changed, 607 insertions(+), 282 deletions(-)


base-commit: 5a6a09e97199d6600d31383055f9d43fbbcbe86f
--
2.41.0