Re: [PATCH v5 31/41] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()

From: Zeng Heng

Date: Sat Mar 07 2026 - 04:29:30 EST


Hi Ben,

On 2026/2/25 1:57, Ben Horgan wrote:
From: James Morse <james.morse@xxxxxxx>

resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation means
the counter may need reading in three different ways. The same goes for
reset.

The helpers behind the resctrl_arch_ functions will be re-used for the ABMC
equivalent functions.

Add the rounding helper for checking monitor values while we're here.

Tested-by: Gavin Shan <gshan@xxxxxxxxxx>
Tested-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
Tested-by: Peter Newman <peternewman@xxxxxxxxxx>
Tested-by: Zeng Heng <zengheng4@xxxxxxxxxx>
Reviewed-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
Reviewed-by: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>
Signed-off-by: James Morse <james.morse@xxxxxxx>
Signed-off-by: Ben Horgan <ben.horgan@xxxxxxx>
---

[...]

+
+static int read_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct mpam_component *mon_comp,
+ enum mpam_device_features mon_type,
+ int mon_idx, u32 closid, u32 rmid, u64 *val)
+{
+ if (cdp_enabled) {

While reviewing the resctrl limbo handling code, I noticed a issue in
__check_limbo() that could lead to premature RMID release when CDP is
enabled.

In __check_limbo(), RMIDs in limbo state undergo L3 occupancy checks
before being released. This check is performed via
resctrl_arch_rmid_read(), on arm64 MPAM, which relies on the cdp_enabled
state to determine to check which PARTID.

The concern arises in the following scenario: Filesystem is mounted with
CDP enabled. During normal operation, some RMIDs enter limbo. On umount,
cdp_enabled is reset to false. __check_limbo() may then run and perform
L3 checks with cdp_enabled = false. This could cause RMIDs to be
incorrectly released from limbo while still effectively busy after
remount.

Apologies for not providing a ready-made fix in this email. However,
I would appreciate to hear the community's thoughts on this issue.


+ u64 code_val = 0, data_val = 0;
+ int err;
+
+ err = __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_CODE, closid, rmid, &code_val);
+ if (err)
+ return err;
+
+ err = __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_DATA, closid, rmid, &data_val);
+ if (err)
+ return err;
+
+ *val += code_val + data_val;
+ return 0;
+ }
+
+ return __read_mon(mon, mon_comp, mon_type, mon_idx,
+ CDP_NONE, closid, rmid, val);
+}
+
+/* MBWU when not in ABMC mode, and CSU counters. */
+int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_domain_hdr *hdr,
+ u32 closid, u32 rmid, enum resctrl_event_id eventid,
+ void *arch_priv, u64 *val, void *arch_mon_ctx)
+{
+ struct mpam_resctrl_dom *l3_dom;
+ struct mpam_component *mon_comp;
+ u32 mon_idx = *(u32 *)arch_mon_ctx;
+ enum mpam_device_features mon_type;
+ struct mpam_resctrl_mon *mon = &mpam_resctrl_counters[eventid];
+
+ resctrl_arch_rmid_read_context_check();
+
+ if (eventid >= QOS_NUM_EVENTS || !mon->class)
+ return -EINVAL;
+
+ l3_dom = container_of(hdr, struct mpam_resctrl_dom, resctrl_mon_dom.hdr);
+ mon_comp = l3_dom->mon_comp[eventid];
+
+ switch (eventid) {
+ case QOS_L3_OCCUP_EVENT_ID:
+ mon_type = mpam_feat_msmon_csu;
+ break;
+ case QOS_L3_MBM_LOCAL_EVENT_ID:
+ case QOS_L3_MBM_TOTAL_EVENT_ID:
+ mon_type = mpam_feat_msmon_mbwu;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ return read_mon_cdp_safe(mon, mon_comp, mon_type, mon_idx,
+ closid, rmid, val);
+}
+


Best regards,
Zeng Heng