Re: [PATCH v20 06/18] x86/resctrl: Introduce snc_nodes_per_l3_cache

From: Reinette Chatre
Date: Tue Jun 18 2024 - 18:58:34 EST


Hi Babu and Tony,

On 6/17/24 3:36 PM, Moger, Babu wrote:
On 6/10/2024 1:35 PM, Tony Luck wrote:

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 89d7e6fcbaa1..f2fd35d294f2 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -97,6 +97,8 @@ unsigned int resctrl_rmid_realloc_limit;
  #define CF(cf)    ((unsigned long)(1048576 * (cf) + 0.5))
+static int snc_nodes_per_l3_cache = 1;
+
  /*
   * The correction factor table is documented in Documentation/arch/x86/resctrl.rst.
   * If rmid > rmid threshold, MBM total and local values should be multiplied
@@ -185,7 +187,43 @@ static inline struct rmid_entry *__rmid_entry(u32 idx)
      return entry;
  }
-static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
+/*
+ * When Sub-NUMA Cluster (SNC) mode is not enabled (as indicated by
+ * "snc_nodes_per_l3_cache  == 1") no translation of the RMID value is
+ * needed. The physical RMID is the same as the logical RMID.
+ *
+ * On a platform with SNC mode enabled, Linux enables RMID sharing mode
+ * via MSR 0xCA0 (see the "RMID Sharing Mode" section in the "Intel
+ * Resource Director Technology Architecture Specification" for a full
+ * description of RMID sharing mode).
+ *
+ * In RMID sharing mode there are fewer "logical RMID" values available
+ * to accumulate data ("physical RMIDs" are divided evenly between SNC
+ * nodes that share an L3 cache). Linux creates an rdt_mon_domain for
+ * each SNC node.
+ *
+ * The value loaded into IA32_PQR_ASSOC is the "logical RMID".
+ *
+ * Data is collected independently on each SNC node and can be retrieved
+ * using the "physical RMID" value computed by this function and loaded
+ * into IA32_QM_EVTSEL. @cpu can be any CPU in the SNC node.
+ *
+ * The scope of the IA32_QM_EVTSEL and IA32_QM_CTR MSRs is at the L3
+ * cache.  So a "physical RMID" may be read from any CPU that shares
+ * the L3 cache with the desired SNC node, not just from a CPU in
+ * the specific SNC node.
+ */
+static int logical_rmid_to_physical_rmid(int cpu, int lrmid)

How about ? (or something similar)

static int get_snc_node_rmid(int cpu, int rmid)

+{
+    struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+    if (snc_nodes_per_l3_cache  == 1)

(nit: unnecessary space)

+        return lrmid;
+
+    return lrmid + (cpu_to_node(cpu) % snc_nodes_per_l3_cache) * r->num_rmid;
+}
+
+static int __rmid_read_phys(u32 prmid, enum resctrl_event_id eventid, u64 *val)

You don't need to write new function.  Just update the rmid.


  {
      u64 msr_val;
@@ -197,7 +235,7 @@ static int __rmid_read(u32 rmid, enum resctrl_event_id eventid, u64 *val)
       * IA32_QM_CTR.Error (bit 63) and IA32_QM_CTR.Unavailable (bit 62)
       * are error bits.
       */
-    wrmsr(MSR_IA32_QM_EVTSEL, eventid, rmid);
+    wrmsr(MSR_IA32_QM_EVTSEL, eventid, prmid);
      rdmsrl(MSR_IA32_QM_CTR, msr_val);
      if (msr_val & RMID_VAL_ERROR)
@@ -233,14 +271,17 @@ void resctrl_arch_reset_rmid(struct rdt_resource *r, struct rdt_mon_domain *d,
                   enum resctrl_event_id eventid)
  {
      struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
+    int cpu = cpumask_any(&d->hdr.cpu_mask);
      struct arch_mbm_state *am;
+    u32 prmid;

snc_rmid?

      am = get_arch_mbm_state(hw_dom, rmid, eventid);
      if (am) {
          memset(am, 0, sizeof(*am));
+        prmid = logical_rmid_to_physical_rmid(cpu, rmid);
          /* Record any initial, non-zero count value. */
-        __rmid_read(rmid, eventid, &am->prev_msr);
+        __rmid_read_phys(prmid, eventid, &am->prev_msr);

How about ? Feel free to simplify.

          if (snc_nodes_per_l3_cache > 1) {
                 snc_rmid = get_snc_node_rmid(cpu, rmid);
                __rmid_read(snc_rmid, eventid, &am->prev_msr);
          } else {
              __rmid_read(rmid, eventid, &am->prev_msr);
          }


When considering something like this I think it would be better to contain the
SNC checking in a single place so that all places needing to read RMID need not
remember to have the same copied "if (snc_nodes_per_l3_cache > 1)" check.
This then essentially becomes logical_rmid_to_physical_rmid() in this patch so
now it just becomes a question about what name to pick for variables and functions.

I do prefer a name like __rmid_read_phys() with a unique "prmid" parameter since that
should prompt developer to give a second thought to what rmid parameter is provided
instead of just blindly calling __rmid_read() that implies that it is reading the
data for the RMID used by resctrl without considering that a conversion may be needed.

I do understand and agree that "logical" vs "physical" is not intuitive here but
to that end I find that the comments explain the distinction well. If there are
better suggestions then they are surely welcome.

In summary, I do think that the "__rmid_read()" function needs a name change to make
clear that it may not be reading the RMID used internally by resctrl and this function
should be accompanied by a function with similar term in its name that does the
conversion and includes the SNC check.

Reinette