Re: [PATCH v5 31/41] arm_mpam: resctrl: Add resctrl_arch_rmid_read() and resctrl_arch_reset_rmid()
From: Zeng Heng
Date: Mon Mar 09 2026 - 23:26:03 EST
Hi Ben,
On 2026/3/10 0:30, Ben Horgan wrote:
Hi Zeng,
On 3/7/26 09:29, Zeng Heng wrote:
Hi Ben,
On 2026/2/25 1:57, Ben Horgan wrote:
From: James Morse <james.morse@xxxxxxx>
resctrl uses resctrl_arch_rmid_read() to read counters. CDP emulation
means
the counter may need reading in three different ways. The same goes for
reset.
The helpers behind the resctrl_arch_ functions will be re-used for the
ABMC
equivalent functions.
Add the rounding helper for checking monitor values while we're here.
Tested-by: Gavin Shan <gshan@xxxxxxxxxx>
Tested-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
Tested-by: Peter Newman <peternewman@xxxxxxxxxx>
Tested-by: Zeng Heng <zengheng4@xxxxxxxxxx>
Reviewed-by: Shaopeng Tan <tan.shaopeng@xxxxxxxxxxxxxx>
Reviewed-by: Jonathan Cameron <jonathan.cameron@xxxxxxxxxx>
Signed-off-by: James Morse <james.morse@xxxxxxx>
Signed-off-by: Ben Horgan <ben.horgan@xxxxxxx>
---
[...]
+
+static int read_mon_cdp_safe(struct mpam_resctrl_mon *mon, struct
mpam_component *mon_comp,
+ enum mpam_device_features mon_type,
+ int mon_idx, u32 closid, u32 rmid, u64 *val)
+{
+ if (cdp_enabled) {
While reviewing the resctrl limbo handling code, I noticed a issue in
__check_limbo() that could lead to premature RMID release when CDP is
enabled.
In __check_limbo(), RMIDs in limbo state undergo L3 occupancy checks
before being released. This check is performed via
resctrl_arch_rmid_read(), on arm64 MPAM, which relies on the cdp_enabled
state to determine to check which PARTID.
The concern arises in the following scenario: Filesystem is mounted with
CDP enabled. During normal operation, some RMIDs enter limbo. On umount,
cdp_enabled is reset to false. __check_limbo() may then run and perform
L3 checks with cdp_enabled = false. This could cause RMIDs to be
incorrectly released from limbo while still effectively busy after
remount.
I think a stale limbo list cause more problems than that. If you mount
with cdp disabled, cause some rmids to be dirty, unmount and then
remount with cdp enabled then you may have some of the entries in upper
half marked as busy but when the limbo code checks them it ends up using
an out of range partid and may trigger an mpam error interrupt.
To avoid a stale list we could disable the limbo checking at unmount and
at remount remake the bitmap. This would involve some resctrl changes
which I will have a further look into. For now, to avoid the dependency
without a lot of patch churn in this series I think we can hide the cdp
enablement behind CONFIG_EXPERT. Does that sound ok to you?
Thanks,
Ben
Confirmed. Toggling between non-CDP and CDP mount modes leads to
out-of-range PARTID hardware errors and memory access violations. This
can cause MPAM to halt by provoking mpam_broken_work.
I agreed properly fixing this will require resctrl modifications to
handle the limbo state across mount cycles. Hiding CDP behind
CONFIG_EXPERT is acceptable as a short-term mitigation to prevent users
from hitting this bug accidentally.
Best regards,
Zeng Heng