Re: [PATCH] arm64/mpam: Support partial-core boot for MPAM

From: Zeng Heng

Date: Tue Feb 03 2026 - 04:24:07 EST




On 2026/2/2 20:46, Zeng Heng wrote:
Hi Ben,

On 2026/2/2 19:34, Ben Horgan wrote:
Hi Zeng,

On 2/2/26 09:16, Zeng Heng wrote:


On 2026/2/2 16:41, Zeng Heng wrote:


On 2026/1/29 18:11, Ben Horgan wrote:
Hi Zeng,

I think I've just managed to whitelist your email address. So, all being
well I'll get your emails in my inbox.

On 1/7/26 03:13, Zeng Heng wrote:
Some MPAM MSCs (like L2 MSC) shares the same power domain with its
associated CPUs. Therefore, in scenarios where only partial cores power
up, the MSCs belonging to the un-powered cores don't need and should
not
be accessed, otherwise bus-access fault would occur.

The MPAM driver intentionally to waits until all MSCs have been
discovered before allowing MPAM to be used so that it can check the
properties of all the MSC and determine the configuration based on full
knowledge. Once a CPU affine with each MSC has been enabled then MPAM
will be enabled and usable.

Suppose we weren't to access all MSCs in an asymmetric configuration.
E.g. if different L2 had different lengths of cache portion bit maps and
MPAM was enabled with only the CPUs with the same L2 then the driver
wouldn't know and we'd end up with a bad configuration which would
become a problem when the other CPUs are eventually turned on.

Hence, I think we should retain the restriction that MPAM is only
enabled once all MSC are probed. Is this a particularly onerous
resctriction for you?


I have no objection to the restriction that "MPAM is only enabled once
all MSC are probed." This constraint ensures the driver has complete
knowledge of all Memory System Components before establishing the
configuration.


However, this patch is specifically designed to address CPU core
isolation scenarios (Such as adding the 'isolcpus=xx' kernel command
line parameter).

In the isolation scenario are you for some cpus, enabling MPAM, using
those cpus but not taking into account the parameters of the associated MSC?

In the CPU core isolation scenario, the CPU affinity information of MSC
must be reported. In fact, ACPI MPAM table has already designed a
mechanism for reporting the affinity information of each MSC instance.

Through the "Hardware ID of linked device" and "Instance ID of linked
device" fields, the container and container ID to which the MSC belongs
are specified respectively, thereby obtaining the MSC affinity
information.

The kernel is responsible for parsing this information and determines
which MSCs should be initialized based on the currently online CPUs.



The patch allows the MPAM driver to successfully complete the
initialization of online MSCs even when the system is booted with
certain cores isolated or disabled. The patch ensures that MPAM
initialization is decoupled from the requirement that all CPUs must be
online during the probing phase.

CPU core isolation is indeed a common production scenario. This
functionality requires the kernel to enable functionalities in the
presence of faulty cores (which cannot be recovered through cold boot).
This ensures system reliability and availability on multi-core
processors where single-core faults.

Without this patch would prevent MPAM from initialization under CPU core
isolation scenarios. Apologies for not mentioning in the patch: we can
verify the functionality by adding 'maxcpus=1' to the boot parameters.

For 'maxcpus=1' I think the correct behaviour is to not enable MPAM as
the other CPUs can then be turned on afterwards. E.g by
echo 1 > /sys/devices/system/cpu/cpuX/online

For faulty cores how would you ensure they are never turned on?


The maxcpus=1 is merely an extreme simulation scenario. In production
environments, detected faulty cores have already been disabled by the
BIOS firmware and cannot be brought online again.



Even the faulty cores or offline CPUs are turned on, the patch does not
affect the automatic recovery and bring-up of MPAM MSCs.

Adding 'maxcpus=1' to the boot parameters, and testing with the patch
applied is as follows:

# mount -t resctrl l resctrl /sys/fs/resctrl/
# cat /sys/fs/resctrl/schemata
L2:4=ff
L3:1=1ffff

# echo 1 > /sys/devices/system/cpu/cpu2/online
# cat /sys/fs/resctrl/schemata
L2:4=ff;7=ff
L3:1=1ffff

# echo 1 > /sys/devices/system/cpu/cpu16/online
# cat /sys/fs/resctrl/schemata
L2:4=ff;7=ff;29=ff
L3:1=1ffff;26=1ffff

# echo 0 > /sys/devices/system/cpu/cpu16/online
# cat /sys/fs/resctrl/schemata
L2:4=ff;7=ff
L3:1=1ffff

# echo 0 > /sys/devices/system/cpu/cpu2/online
# cat /sys/fs/resctrl/schemata
L2:4=ff
L3:1=1ffff



Best Regards,
Zeng Heng