Re: [PATCH v20 00/18] Add support for Sub-NUMA cluster (SNC) systems

From: Moger, Babu
Date: Thu Jun 13 2024 - 15:18:59 EST


Hi Reinette,

I may be little bit out of sync here. Also, sorry to come back late in the
series.

Looking at the series again, I see this approach adds lots of code.
Look at this structure.


@@ -187,10 +196,12 @@ struct rdt_resource {
bool alloc_capable;
bool mon_capable;
int num_rmid;
- enum resctrl_scope scope;
+ enum resctrl_scope ctrl_scope;
+ enum resctrl_scope mon_scope;
struct resctrl_cache cache;
struct resctrl_membw membw;
- struct list_head domains;
+ struct list_head ctrl_domains;
+ struct list_head mon_domains;
char *name;
int data_width;
u32 default_ctrl;

There are two scope fields.
There are two domains fields.

These are very confusing and very hard to maintain. Also, I am not sure if
these fields are useful for anything other than SNC feature. This approach
adds quite a bit of code for no specific advantage.

Why don't we just split the RDT_RESOURCE_L3 resource
into separate resources, one for control, one for monitoring.
We already have "control" only resources (MBA, SMBA, L2). Lets create new
"monitor" only resource. I feel it will be much cleaner approach.

Tony has already tried that approach and showed that it is much simpler.

v15-RFC :
https://lore.kernel.org/lkml/20240130222034.37181-1-tony.luck@xxxxxxxxx/

What do you think?

Thanks
Babu


On 6/10/24 13:35, Tony Luck wrote:
> This series based on top of tip x86/cache commit f385f0246394
> ("x86/resctrl: Replace open coded cacheinfo searches")
>
> The Sub-NUMA cluster feature on some Intel processors partitions the CPUs
> that share an L3 cache into two or more sets. This plays havoc with the
> Resource Director Technology (RDT) monitoring features. Prior to this
> patch Intel has advised that SNC and RDT are incompatible.
>
> Some of these CPUs support an MSR that can partition the RMID counters
> in the same way. This allows monitoring features to be used. Legacy
> monitoring files provide the sum of counters from each SNC node for
> backwards compatibility. Additional files per SNC node provide details
> per node.
>
> Memory bandwidth allocation features continue to operate at
> the scope of the L3 cache.
>
> L3 cache occupancy and allocation operate on the portion of
> L3 cache available for each SNC node.
>
> Signed-off-by: Tony Luck <tony.luck@xxxxxxxxx>
>
> ---
> Changes since v19: https://lore.kernel.org/all/20240528222006.58283-1-tony.luck@xxxxxxxxx/
>
> 1-4: Refactor on top of <linux/cacheinfo.h> change.
> Nothing functional.
>
> 5: No change
>
> 6: Updated commit message with note about RMID Sharing mode.
> Renamed __rmid_read() to __rmid_read_phys() and performed
> translation from logical RMID to physical RMID at callsites.
> Updated comment for __rmid_read_phys() with explanation of
> logical/physical RMIDs. Consistently use "SNC node" avoid
> SNC domain. Add specifics for non-SNC mode.
> Joined split line on __rmid_read() definition (even with the
> added "_phys" to its name still fits on one line.
>
> 7: No change
>
> 8: get_cpu_cacheinfo_level() moved to <linux/cacheinfo.h>
> currently in tip x86/cache
> no other changes
>
> 9: Dropped the "sumdomains" field from struct rmid_read (a NULL
> domain field now indicates that summing is needed).
> Fix kerneldoc comments for struct rmid_read.
> Updated commit comments with more "why" than "what".
>
> 10: No change
>
> 11: Fix commit comments per suggestions
> 1) Added some "why it is OK to take a bit from evtid"
> 2) s/The stolen bit is given to/Give the bit to/
> 3) Don't use "l3_cache_id" (which looks like a variable)
>
> 12: Fix commit message.
> s/kernfs_find_and_get_ns()/kernfs_find_and_get()/
> Add kernfs_put() to drop hold from kernfs_find_and_get()
> Drop useless "/* create the directory */" comment.
>
> 13: Add kernfs_put() to drop hold from kernfs_find_and_get() [two places]
>
> 14: Add cpumask parameter to mon_event_read() so SNC decsions are
> all in rdtgroup_mondata_show() instead of spread between functions.
> Add comments in rdtgroup_mondata_show() to explain the sum vs. no-sum
> cases.
> Moved the mon_event_read() call into both arms of the if-else
> instead of "d = NULL; goto got_cacheinfo;"
>
> 15: New (replaces 15-17). Make __mon_event_read() do the sum across
> domains (at filesystem level). Move the CPU/domain sanity check out
> of resctrl_arch_rmid_read() and into __mon_event_read()
> with separate scope tests for single domain vs. sum over
> domains.
>
> 16: [Was 18] Update commit message with details about MSR 0xCA0, what changes
> when bit 0 is cleared, and why this is necessary.
> Dropped "Add an architecture specific hook" language from
> commit message.
>
> 17: [Was 19] Drop "and enabling" from shortlog (enabling done by
> previous commit).
> Added checks that cpumask_weight() isn't returning zero (to keep
> static checkers from warning of possible divide by zero).
>
> 18: [Was 20] Fix some "Sub-NUMA" references to say "Sub-NUMA Cluster"
> Added document section on effect of SNC mode on MBA and L3 CAT.
>
> Tony Luck (18):
> x86/resctrl: Prepare for new domain scope
> x86/resctrl: Prepare to split rdt_domain structure
> x86/resctrl: Prepare for different scope for control/monitor
> operations
> x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
> x86/resctrl: Add node-scope to the options for feature scope
> x86/resctrl: Introduce snc_nodes_per_l3_cache
> x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster
> (SNC) systems
> x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files
> x86/resctrl: Add a new field to struct rmid_read for summation of
> domains
> x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function
> x86/resctrl: Allocate a new field in union mon_data_bits
> x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files
> x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC)
> mode
> x86/resctrl: Fill out rmid_read structure for smp_call*() to read a
> counter
> x86/resctrl: Make __mon_event_count() handle sum domains
> x86/resctrl: Enable RMID shared RMID mode on Sub-NUMA Cluster (SNC)
> systems
> x86/resctrl: Sub-NUMA Cluster (SNC) detection
> x86/resctrl: Update documentation with Sub-NUMA cluster changes
>
> Documentation/arch/x86/resctrl.rst | 27 ++
> include/linux/resctrl.h | 87 ++++--
> arch/x86/include/asm/msr-index.h | 1 +
> arch/x86/kernel/cpu/resctrl/internal.h | 93 +++++--
> arch/x86/kernel/cpu/resctrl/core.c | 312 ++++++++++++++++------
> arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 85 +++---
> arch/x86/kernel/cpu/resctrl/monitor.c | 242 ++++++++++++++---
> arch/x86/kernel/cpu/resctrl/pseudo_lock.c | 27 +-
> arch/x86/kernel/cpu/resctrl/rdtgroup.c | 272 ++++++++++++-------
> 9 files changed, 835 insertions(+), 311 deletions(-)
>
>
> base-commit: f385f024639431bec3e70c33cdbc9563894b3ee5

--
Thanks
Babu Moger