Re: [PATCH] x86/resctrl: Fix arch_mbm_* array overrun on SNC

From: Reinette Chatre
Date: Thu Aug 22 2024 - 12:34:28 EST


Hi Peter,

On 8/1/24 11:16 AM, Reinette Chatre wrote:
On 7/22/24 1:46 PM, Peter Newman wrote:
When using resctrl on systems with Sub-NUMA Clustering enabled,
monitoring groups may be allocated RMID values which would overrun the
arch_mbm_{local,total} arrays.

This is due to inconsistencies in whether the SNC-adjusted num_rmid
value or the unadjusted value in resctrl_arch_system_num_rmid_idx() is
used. The num_rmid value for the L3 resource is currently:

  resctrl_arch_system_num_rmid_idx() / snc_nodes_per_l3_cache

As a simple fix, make resctrl_arch_system_num_rmid_idx() return the
SNC-adjusted, L3 num_rmid value on x86.


Thank you very much for finding, root-causing, and providing a fix for
the issue.

Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
Signed-off-by: Peter Newman <peternewman@xxxxxxxxxx>
---
  arch/x86/include/asm/resctrl.h     | 6 ------
  arch/x86/kernel/cpu/resctrl/core.c | 8 ++++++++
  include/linux/resctrl.h            | 3 +++
  3 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/resctrl.h b/arch/x86/include/asm/resctrl.h
index 12dbd2588ca7..8b1b6ce1e51b 100644
--- a/arch/x86/include/asm/resctrl.h
+++ b/arch/x86/include/asm/resctrl.h
@@ -156,12 +156,6 @@ static inline void resctrl_sched_in(struct task_struct *tsk)
          __resctrl_sched_in(tsk);
  }
-static inline u32 resctrl_arch_system_num_rmid_idx(void)
-{
-    /* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
-    return boot_cpu_data.x86_cache_max_rmid + 1;
-}
-
  static inline void resctrl_arch_rmid_idx_decode(u32 idx, u32 *closid, u32 *rmid)
  {
      *rmid = idx;
diff --git a/arch/x86/kernel/cpu/resctrl/core.c b/arch/x86/kernel/cpu/resctrl/core.c
index 1930fce9dfe9..8591d53c144b 100644
--- a/arch/x86/kernel/cpu/resctrl/core.c
+++ b/arch/x86/kernel/cpu/resctrl/core.c
@@ -119,6 +119,14 @@ struct rdt_hw_resource rdt_resources_all[] = {
      },
  };
+u32 resctrl_arch_system_num_rmid_idx(void)
+{
+    struct rdt_resource *r = &rdt_resources_all[RDT_RESOURCE_L3].r_resctrl;
+
+    /* RMID are independent numbers for x86. num_rmid_idx == num_rmid */
+    return r->num_rmid;
+}
+
  /*
   * cache_alloc_hsw_probe() - Have to probe for Intel haswell server CPUs
   * as they do not have CPUID enumeration support for Cache allocation.
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index b0875b99e811..43ac241471b3 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -248,6 +248,9 @@ struct resctrl_schema {
  /* The number of closid supported by this resource regardless of CDP */
  u32 resctrl_arch_get_num_closid(struct rdt_resource *r);
+
+u32 resctrl_arch_system_num_rmid_idx(void);
+

nit: the additional empty lines are unnecessary.

  int resctrl_arch_update_domains(struct rdt_resource *r, u32 closid);
  /*

| Reviewed-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>


It would be great if this fix can be included in the kernel release.
Could you please send V2 with nit and tag applied to be ready for
inclusion?

Thank you

Reinette