Re: [PATCH v2 00/16] fs,x86/resctrl: Add kernel-mode (e.g., PLZA) support to the resctrl subsystem
From: Babu Moger
Date: Mon Mar 30 2026 - 14:49:42 EST
Hi Reinette,
On 3/27/26 17:11, Reinette Chatre wrote:
Hi Babu,Ok. That is fine. We can do that.
On 3/26/26 10:12 AM, Babu Moger wrote:
Hi Reinette,The "both ways" are specific to one of the two active modes though.
Thanks for the review comments. Will address one by one.
On 3/24/26 17:51, Reinette Chatre wrote:
Hi Babu,Sure. Will try. Lets continue the discussion.
On 3/12/26 1:36 PM, Babu Moger wrote:
This series adds support for Privilege-Level Zero Association (PLZA) to theOur discussion considered how resctrl could support PLZA in a generic way while
resctrl subsystem. PLZA is an AMD feature that allows specifying a CLOSID
and/or RMID for execution in kernel mode (privilege level zero), so that
kernel work is not subject to the same resource constrains as the current
user-space task. This avoids kernel operations being aggressively throttled
when a task's memory bandwidth is heavily limited.
The feature documentation is not yet publicly available, but it is expected
to be released in the next few weeks. In the meantime, a brief description
of the features is provided below.
Privilege Level Zero Association (PLZA)
Privilege Level Zero Association (PLZA) allows the hardware to
automatically associate execution in Privilege Level Zero (CPL=0) with a
specific COS (Class of Service) and/or RMID (Resource Monitoring
Identifier). The QoS feature set already has a mechanism to associate
execution on each logical processor with an RMID or COS. PLZA allows the
system to override this per-thread association for a thread that is
executing with CPL=0.
------------------------------------------------------------------------
The series introduces the feature in a way that supports the interface in
a generic manner to accomodate MPAM or other vendor specific implimentation.
Below is the detailed requirements provided by Reinette:
https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@xxxxxxxxx/
also preparing to support MPAM's variants and how PLZA may evolve to have similar
capabilities when considering the capabilities of its registers.
This does not mean that your work needs to implement everything that was discussed.
Instead, this work is expected to just support what PLZA is capable of today but
do so in a way that the future enhancements could be added to.
This series is quite difficult to follow since it appears to implement a full
featured generic interface while PLZA cannot take advantage of it.
Could you please simplify this work to focus on just enabling PLZA and only
add interfaces needed to do so?
Sure. Will do.Summary:To help with future usages please connect visibility of this file with the mode in
1. Kernel-mode/PLZA controls and status should be exposed under the resctrl
info directory:/sys/fs/resctrl/info/, not as a separate or arch-specific path.
2. Add two info files
a. kernel_mode
Purpose: Control how resource allocation and monitoring apply in kernel mode
(e.g. inherit from task vs global assign).
Read: List supported modes and show current one (e.g. with [brackets]).
Write: Set current mode by name (e.g. inherit_ctrl_and_mon, global_assign_ctrl_assign_mon).
b. kernel_mode_assignment
Purpose: When a “global assign” kernel mode is active, specify which resctrl group
(CLOSID/RMID) is used for kernel work.
Read: Show the assigned group in a path-like form (e.g. //, ctrl1//, ctrl1/mon1/).
Write: Assign or clear the group used for kernel mode (and optionally clear with an empty write).
The patches are based on top of commit (v7.0.0-rc3)
839e91ce3f41b (tip/master) Merge branch into tip/master: 'x86/tdx'
------------------------------------------------------------------------
Examples: kernel_mode and kernel_mode_assignment
All paths below are under /sys/fs/resctrl/ (e.g. info/kernel_mode means
/sys/fs/resctrl/info/kernel_mode). Resctrl must be mounted and the platform
must support the relevant modes (e.g. AMD with PLZA).
1) kernel_mode — show and set the current kernel mode
Read supported modes and which one is active (current in brackets):
$ cat info/kernel_mode
[inherit_ctrl_and_mon]
global_assign_ctrl_inherit_mon
global_assign_ctrl_assign_mon
Set the active mode (e.g. use one CLOSID+RMID for all kernel work):
$ echo "global_assign_ctrl_assign_mon" > info/kernel_mode
$ cat info/kernel_mode
inherit_ctrl_and_mon
global_assign_ctrl_inherit_mon
[global_assign_ctrl_assign_mon]
Mode meanings:
- inherit_ctrl_and_mon: kernel uses same CLOSID/RMID as the current task (default).
- global_assign_ctrl_inherit_mon: one CLOSID for all kernel work; RMID inherited from user.
- global_assign_ctrl_assign_mon: one resource group (CLOSID+RMID) for all kernel work.
2) kernel_mode_assignment — show and set which group is used for kernel work
Only relevant when kernel_mode is not "inherit_ctrl_and_mon". Read the
info/kernel_mode. This helps us to support future modes with other resctrl files, possible
within each resource group.
Specifically, kernel_mode_assignment is not visible to user space if mode is "inherit_ctrl_and_mon",
while it is visible when mode is global_assign_ctrl_inherit_mon or global_assign_ctrl_assign_mon.
This can be done both ways. Whole purpose of these groups is to get CLOSID and RMID to enable PLZA. User can echo CTRL_MON or MON group to kernel_mode_assignment in any of the modes. We can decide what needs to be updated in MSR (PQR_PLZA_ASSOC) based on what kernel mode is selected.currently assigned group (path format is "CTRL_MON/MON/"):The format depends on the mode, right? If the mode is "global_assign_ctrl_inherit_mon"
then it should only contain a control group, alternatively, if the mode is
"global_assign_ctrl_assign_mon" then it contains control and mon group. This gives
resctrl future flexibility to change format for future modes.
PLZA only needs the RMID when the mode is "global_assign_ctrl_assign_mon".
Displaying and parsing monitor group when the mode is
"global_assign_ctrl_inherit_mon" creates an inconsistent interface since the mode
only uses a control group. The interface to user space should match the mode otherwise
it becomes confusing.
...
Based on previous comment in https://lore.kernel.org/lkml/abb049fa-3a3d-4601-9ae3-61eeb7fd8fcf@xxxxxxx/Each thread has an MSR to configure whether to associate privilege level zero execution with a separate COS and/or RMID, and the value of the COS and/or RMID. PLZA may be enabled or disabled on a per-thread basis. However, the COS and RMID association and configuration must be the same for all threads in the QOS Domain.Tony suggested using global variables to store the kernel modeI do not see why the context switch path needs to be touched at all with this
CLOSID and RMID. However, the kernel mode CLOSID and RMID are
coming from rdtgroup structure with the new interface. Accessing
them requires holding the associated lock, which would make the
context switch path unnecessarily expensive. So, dropped the idea.
https://lore.kernel.org/lkml/aXuxVSbk1GR2ttzF@agluck-desk3/
Let me know if there are other ways to optimize this.
implementation. Since PLZA only supports global assignment does it not mean that resctrl
only needs to update PQR_PLZA_ASSOC when user writes to info/kernel_mode and
info/kernel_mode_assignment?
and this implementation all fields of PQR_PLZA_ASSOC except PQR_PLZA_ASSOC.plza_en must be the
same for all CPUs on the system, not just per QoS domain. Could you please confirm?
Sorry for the confusion. It is "per QoS domain".
All the fields of PQR_PLZA_ASSOC except PQR_PLZA_ASSOC.plza_enmust be set to the same value for all HW threads in the QOS domain for consistent operation (Per-QosDomain).
Yes, I agree with your concerns. The goal here is to make the interface less disruptive while still addressing the different use cases.
So, PQR_PLZA_ASSOC is a per thread MSR just like PQR_ASSOC.A couple of points:
Privilege-Level Zero Association (PLZA) allows the user to specify a COS and/or RMID associated with execution in Privilege-Level Zero. When enabled on a HW thread, when that thread enters Privilige-Level Zero, transactions associated with that thread will be associated with the PLZA COS and/or RMID. Otherwise, the HW thread will be associated with the COS and RMID identified by PQR_ASSOC.
More below.
Consider some of the scenarios:
resctrl mount with default state:
# cat info/kernel_mode
[inherit_ctrl_and_mon]
global_assign_ctrl_inherit_mon
global_assign_ctrl_assign_mon
# ls info/kernel_mode_assignment
ls: cannot access 'info/kernel_mode_assignment': No such file or directory
enable global_assign_ctrl_assign_mon mode:
# echo "global_assign_ctrl_assign_mon" > info/kernel_mode
Expectation here is that when user space sets this mode as above then resctrl would
in turn program MSR_IA32_PQR_PLZA_ASSOC on all CPUs to be:
MSR_IA32_PQR_PLZA_ASSOC.rmid=0
MSR_IA32_PQR_PLZA_ASSOC.rmid_en=1
MSR_IA32_PQR_PLZA_ASSOC.closid=0
MSR_IA32_PQR_PLZA_ASSOC.closid_en=1
MSR_IA32_PQR_PLZA_ASSOC.plza_en=1
I do not see why it is necessary to maintain any per-CPU or per-task state or needing
to touch the context switch code. Since PLZA only supports global could it not
just set MSR_IA32_PQR_PLZA_ASSOC on all online CPUs and be done with it?
Only caveat is that if a CPU is offline then this setting needs to be stashed
so that MSR_IA32_PQR_PLZA_ASSOC can be set when new CPU comes online.
The way that rdtgroup_config_kmode() introduced in patch #11 assumes it is dealing
with RDT_RESOURCE_L3 and traverses the resource domain list and resource group
CPU mask seems unnecessary to me as well as error prone since the system may only
have, for example, RDT_RESOURCE_MBA enabled or even just monitoring. Why not just set
MSR_IA32_PQR_PLZA_ASSOC on all CPUs and be done?
To continue the scenarios ...
After user's setting above related files read:
# cat info/kernel_mode
inherit_ctrl_and_mon
global_assign_ctrl_inherit_mon
[global_assign_ctrl_assign_mon]
# cat info/kernel_mode_assignment
//
Modify group used by global_assign_ctrl_assign_mon mode:
# echo 'ctrl1/mon1/' > info/kernel_mode_assignment
Expectation here is that when user space sets this then resctrl would
program MSR_IA32_PQR_PLZA_ASSOC on all CPUs to be:
MSR_IA32_PQR_PLZA_ASSOC.rmid=<rmid of mon1>
MSR_IA32_PQR_PLZA_ASSOC.rmid_en=1
MSR_IA32_PQR_PLZA_ASSOC.closid=<closid of ctrl1>
MSR_IA32_PQR_PLZA_ASSOC.closid_en=1
MSR_IA32_PQR_PLZA_ASSOC.plza_en=1
This works correctly when PLZA associations are defined by per CPU. For example, lets assume that *ctrl1* is assigned *CLOSID 1*.
In this scenario, every task in the system running on a any CPU will use the limits associated with *CLOSID 1* whenever it enters Privilege-Level Zero, because the CPU's *PQR_PLZA_ASSOC* register has PLZA enabled and CLOSID is 1.
Now consider task-based association:
We have two resctrl groups:
* *ctrl1 -> CLOSID 1 -> task1.plza = 1 : *User wants PLZA be enabled
for this task.
* *ctrl2 -> CLOSID 2 -> task2.plza = 0 : *User wants PLZA
disabled for this task.
Suppose *task1* is first scheduled on *CPU 0*. This behaves as expected: since CPU 0 's *PQR_PLZA_ASSOC* contains *CLOSID 1, plza_en =1*, task1 will use the limits from CLOSID 1 when it enters Privilege-Level Zero.
However, if *task2* later runs on *CPU 0*, we expect it to use *CLOSID 2* in both user mode and kernel mode, because user has PLZA disabled for this task. But CPU 0 still has *CLOSID 1, **plza_en =1* in its PQR_PLZA_ASSOC register.
As a result, task2 will incorrectly run with *CLOSID 1* when entering Privilege-Level Zero something we explicitly want to avoid.
At that point, PLZA must be disabled on CPU 0 to prevent the unintended association. Hope this explanation makes the issue clear.
- Looks like we still need to come to agreement what is meant by "global" when it
comes to kernel mode.
In your description there is a "global" configuration, but the assignment is "per-task".
To me this sounds like a new and distinct kernel_mode from the "global" modes
considered so far. This seems to move to the "per_task" mode mentioned in but
the implementation does not take into account any of the earlier discussions
surrounding it:
https://lore.kernel.org/lkml/2ab556af-095b-422b-9396-f845c6fd0342@xxxxxxxxx/
We only learned about one use case in https://lore.kernel.org/lkml/CABPqkBSq=cgn-am4qorA_VN0vsbpbfDePSi7gubicpROB1=djw@xxxxxxxxxxxxxx/
As I understand this use case requires PLZA globally enabled for all tasks. Thus
I consider task assignment to be "global" when in the "global_*" kernel modes.
If this is indeed a common use case then supporting only global configuration
but then requiring user space to manually assign all tasks afterwards sounds
cumbersome for user space and also detrimental to system performance with all
the churn to modify all the task_structs involved. The accompanying documentation
does not mention all this additional user space interactions required by user
space to use this implementation.
I find this implementation difficult and inefficient to use in the one use case
we know of. I would suggest that resctrl optimizes for the one known use case.
- This implementation ignores discussion on how existing resctrl files should
not be repurposed.
This implementation allows user space to set a resource group in
kernel_mode_assignment with the consequence that this resource group's
"tasks" file changes behavior. I consider this a break of resctrl interface.
We did briefly consider per-task configuration/assignment in previous discussion
and the proposal was for it to use a new file (only when and if needed!).
- Now a user is required to write the task id of every task that participates
in PLZA. Apart from the churn already mentioned this also breaks existing
usage since it is no longer possible for new tasks to be added to this
resource group. This creates an awkward interface where all tasks belonging
to a resource group inherits the allocations/monitoring for their user space
work and will get PLZA enabled whether user requested it or not while
tasks from other resource groups need to be explicitly enabled. This creates
an inconsistency when it comes to task assignment. The only way to "remove"
PLZA from such a task would be to assign it to another resource group which
may not have the user space allocations ... and once this is done the task
cannot be moved back.
There is no requirement that CLOSID/RMID should be dedicated to kernel work
but this implementation does so in an inconsistent way.
- Apart from the same issues as with repurposing of tasks file, why should same
CPU allocation be used for kernel and user space?
Background: Customers have identified an issue with the QoS
Bandwidth Control feature: when a CLOS is aggressively throttled
and execution transitions into kernel mode, kernel operations are
also subject to the same aggressive throttling.
Privilege-Level Zero Association (PLZA) allows a user to specify a COS and/or RMID to be used during execution at Privilege Level Zero. When PLZA is enabled on a hardware thread, any execution that enters Privilege Level Zero will have its transactions associated with the PLZA COS and/or RMID. Otherwise, the thread continues to use the COS and RMID specified by |PQR_ASSOC|. In other words, the hardware provides a dedicated COS and/or RMID specifically for kernel-mode execution.
There are multiple ways this feature can be applied. For simplicity, the discussion below focuses only on CLOSID.
1. Global PLZA enablement
PLZA can be configured as a global feature by setting |PQR_PLZA_ASSOC.closid = CLOSID| and |PQR_PLZA_ASSOC.plza_en = 1| on all threads in the system. A dedicated CLOSID is reserved for this purpose, and all CPU threads use its allocations whenever they enter Privilege Level Zero. This CLOSID does not need to be associated with any resctrl group. The user can explicitly enable or disable this feature. There is no context switch overhead but there is no flexibility with this approach.
2. Group based PLZA allocation : PLZA is managed via dedicated
restctrl group. A separate resctrl group can be created
specifically for PLZA, with a dedicated CLOSID used exclusively
for kernel mode execution. This approach can be further divided
into two association models:
i) CPU based association
CPUs are assigned to the PLZA group, and PLZA is enabled only on those CPUs. This effectively creates a dedicated PLZA group. MSRs (|PQR_PLZA_ASSOC)| are programmed only when the user changes CPU assignments. This approach requires no changes to the context switch code and introduces no additional context switch overhead.
ii) Task based association
Tasks are explicitly assigned by the user to the PLZA group. Tasks need to be updated when user adds a new task. Also, this requires updates during task scheduling so that the MSRs (|PQR_PLZA_ASSOC)| are programmed on each context switch, which introduces additional context switch overhead.
I tried to fit these requirements into the interface files in /sys/fs/resctrl/info/. I may have missed few things while trying to achieve it. As usual, I am open for the discussion and recommendations.
Thanks,
Babu