Re: [PATCH v3 0/9] arm_mpam: Introduce Narrow-PARTID feature
From: Zeng Heng
Date: Mon Apr 20 2026 - 04:20:08 EST
Hi Shaopeng,
On 2026/4/16 14:11, Shaopeng Tan (Fujitsu) wrote:
Hello Zeng Heng,
Hi Shaopeng,
On 2026/4/10 8:13, Shaopeng Tan (Fujitsu) wrote:
Hello Zeng Heng,
This series applies on top of the mpam_resctrl_glue_v5_debugfs branch of:
https://gitlab.arm.com/linux-arm/linux-bh.git
Background
==========
On x86, the resctrl allows creating up to num_rmids monitoring groups under
parent control group. However, ARM64 MPAM is currently limited by the PMG
(Performance Monitoring Group) count, which is typically much smaller than
the theoretical RMID limit. This creates a significant scalability gap: users
expecting fine-grained per-process or per-thread monitoring quickly exhaust
the PMG space, even when plenty of reqPARTIDs remain available.
The Narrow-PARTID feature, defined in the ARM MPAM architecture,
addresses this by associating reqPARTIDs with intPARTIDs through a
programmable many-to-one mapping. This allows the kernel to present more
logical monitoring contexts.
Design Overview
===============
The implementation extends the RMID encoding to carry reqPARTID
information:
RMID = reqPARTID * NUM_PMG + PMG
In this patchset, a monitoring group is uniquely identified by the combination of
reqPARTID and PMG. The closid is represented by intPARTID, which is exactly
the original PARTID.
For systems with homogeneous MSCs (all supporting Narrow-PARTID), the
driver exposes the full reqPARTID range directly. For heterogeneous systems
where some MSCs lack Narrow-PARTID support, the driver utilizes PARTIDs
beyond the intPARTID range as reqPARTIDs to expand monitoring capacity.
The sole exception is when MBA MSCs lack Narrow-PARTID support, their
percentage-based control mechanism prevents the use of PARTIDs as
reqPARTIDs.
Capacity Improvements
=====================
--------------------------------------------------------------------------
The maximum | Sub-monitoring groups | System-wide
number of | under a control group | monitoring groups
--------------------------------------------------------------------------
Without | |
reqPARTID | PMG | intPARTID *
PMG
--------------------------------------------------------------------------
reqPARTID | |
static allocation | (reqPARTID // intPARTID) * PMG | reqPARTID * PMG
--------------------------------------------------------------------------
reqPARTID | |
dynamic allocation | (reqPARTID − intPARTID + 1) * PMG | reqPARTID * PMG
--------------------------------------------------------------------------
Under MPAM, the number of reqPARTID is always greater than or equal to
intPARTID.
If reqPARTID % intPARTID > 0, does that mean we cann’t fully use reqPARTIDs?
Some chips have a large number of intPARTIDs, is it possible to use only a limited number of intPARTIDs?
Thanks for the feedback.
Consider adding a boot parameter to allow users to limit the number of
intPARTIDs, and this parameter should satisfy the constraint:
reqPARTID % intPARTID == 0
Alternatively, we also provided dynamic allocation of reqPARTIDs. This
approach would maximize resource utilization, providing number of
monitor groups up to:
(reqPARTID - intPARTID + 1) * PMG
As you know, Arm (Dave) has also implemented the partid narrowing feature[1],
are there aspects of Arm's proposal that do not meet Huawei's requirements?
Also, could you tell me about your future plans?
https://lore.kernel.org/lkml/20250117151033.1517882-1-Dave.Martin@xxxxxxx/ [1]
Huawei does not have any specific requirements or constraints for this
feature. When we initially tried to send this patch series, there was no
existing implementation of this capability in the community.
In addition to the static allocation approach similar to Dave's
proposal, we have also provided the dynamic allocation scheme mentioned
above. However, this would require moderate changes to the resctrl core
layer.
If the community has not yet implemented this feature and would be
willing to provide review feedback, we are committed to continuously
updating and iterating on this patch series for general MPAM hardware
platforms.
About two years ago, Fujitsu discussed the PARTID narrowing proposal with ARM,
a year ago, I ran Arm's PARTID narrowing implementation (static allocation approach) and confirmed that it worked properly.
Although Arm's implementation requires refactoring based on the latest MPAM branch,
would it be better to proceed with Arm's implementation?
Additionally, since implementing the dynamic allocation approach requires significant changes to the resctrl side,
how about posting the dynamic allocation approach until the static allocation approach has been merged?
From a long-term perspective, the Narrow PARTID static allocation scheme
must maintain software extensibility for dynamic allocation.
To demonstrate this, we sent the dynamic allocation approach alongside
the static allocation scheme. This shows that under the static
allocation scheme, while keeping the RMID encoding intact, only a
small amount of incremental code changes are required to achieve
reqPARTID dynamic allocation.
Kind regards,
Zeng Heng