[PATCH] x86,fs/resctrl: Prevent out-of-bounds access while offlining CPU when SNC enabled

From: Reinette Chatre

Date: Wed Jun 03 2026 - 16:37:15 EST


The architecture updates the cpu_mask in a domain's header to track which
online CPUs are associated with the domain. When this mask becomes empty
the architecture initiates offline of the domain that includes calling
on resctrl fs to offline the domain. If it is a monitoring domain in
which LLC occupancy is tracked resctrl fs forces the limbo handler to
release all busy RMID.

The limbo handler reads the current event value associated with a busy
RMID irrespective of it being checked as part of regular "is it still busy"
check or whether it will be forced released anyway. When reading an RMID
on a system with SNC enabled the "logical RMID" is converted to the
"physical RMID" and this conversion requires the NUMA node ID of the
resctrl monitoring domain that is in turn determined by querying the NUMA
node ID of any CPU belonging to the monitoring domain.

When the monitoring domain is going offline its cpu_mask is empty causing
the NUMA node ID query via cpu_to_node() to be done with "nr_cpu_ids" as
argument resulting in an out-of-bounds access.

Refactor the limbo handler to skip reading the RMID when the RMID will
just be forced released anyway. Add a safety check to the architecture's
RMID reader to protect against this scenario.

Fixes: e13db55b5a0d ("x86/resctrl: Introduce snc_nodes_per_l3_cache")
Reported-by: Sashiko <sashiko-bot@xxxxxxxxxx>
Closes: https://sashiko.dev/#/patchset/cover.1780456704.git.reinette.chatre%40intel.com?part=9
Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
---
Changes since v4:
- New patch
---
arch/x86/kernel/cpu/resctrl/monitor.c | 5 ++++
fs/resctrl/monitor.c | 39 +++++++++++++++------------
2 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index 9bf9d7e201aa..fb7024ae50e6 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -259,6 +259,11 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
if (!domain_header_is_valid(hdr, RESCTRL_MON_DOMAIN, RDT_RESOURCE_L3))
return -EINVAL;

+ if (cpumask_empty(&hdr->cpu_mask)) {
+ pr_warn_once("Domain %d has no CPUs\n", hdr->id);
+ return -EINVAL;
+ }
+
d = container_of(hdr, struct rdt_l3_mon_domain, hdr);
hw_dom = resctrl_to_arch_mon_dom(d);
cpu = cpumask_any(&hdr->cpu_mask);
diff --git a/fs/resctrl/monitor.c b/fs/resctrl/monitor.c
index 0e6a389a16bf..a932a1fea818 100644
--- a/fs/resctrl/monitor.c
+++ b/fs/resctrl/monitor.c
@@ -135,10 +135,10 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
struct rdt_resource *r = resctrl_arch_get_resource(RDT_RESOURCE_L3);
u32 idx_limit = resctrl_arch_system_num_rmid_idx();
struct rmid_entry *entry;
+ bool rmid_dirty = true;
u32 idx, cur_idx = 1;
void *arch_mon_ctx;
void *arch_priv;
- bool rmid_dirty;
u64 val = 0;

arch_priv = mon_event_all[QOS_L3_OCCUP_EVENT_ID].arch_priv;
@@ -161,22 +161,27 @@ void __check_limbo(struct rdt_l3_mon_domain *d, bool force_free)
break;

entry = __rmid_entry(idx);
- if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid, entry->rmid,
- QOS_L3_OCCUP_EVENT_ID, arch_priv, &val,
- arch_mon_ctx)) {
- rmid_dirty = true;
- } else {
- rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
-
- /*
- * x86's CLOSID and RMID are independent numbers, so the entry's
- * CLOSID is an empty CLOSID (X86_RESCTRL_EMPTY_CLOSID). On Arm the
- * RMID (PMG) extends the CLOSID (PARTID) space with bits that aren't
- * used to select the configuration. It is thus necessary to track both
- * CLOSID and RMID because there may be dependencies between them
- * on some architectures.
- */
- trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid, d->hdr.id, val);
+ if (!force_free) {
+ if (resctrl_arch_rmid_read(r, &d->hdr, entry->closid,
+ entry->rmid, QOS_L3_OCCUP_EVENT_ID,
+ arch_priv, &val, arch_mon_ctx)) {
+ rmid_dirty = true;
+ } else {
+ rmid_dirty = (val >= resctrl_rmid_realloc_threshold);
+
+ /*
+ * x86's CLOSID and RMID are independent numbers,
+ * so the entry's CLOSID is an empty CLOSID
+ * (X86_RESCTRL_EMPTY_CLOSID). On Arm the RMID
+ * (PMG) extends the CLOSID (PARTID) space with
+ * bits that aren't used to select the configuration.
+ * It is thus necessary to track both CLOSID and
+ * RMID because there may be dependencies between
+ * them on some architectures.
+ */
+ trace_mon_llc_occupancy_limbo(entry->closid, entry->rmid,
+ d->hdr.id, val);
+ }
}

if (force_free || !rmid_dirty) {
--
2.53.0