Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

From: Moger, Babu

Date: Mon Feb 23 2026 - 17:36:08 EST


Hi Reinette,

On 2/23/2026 11:12 AM, Reinette Chatre wrote:
Hi Babu,

On 2/20/26 2:44 PM, Moger, Babu wrote:
On 2/19/2026 8:53 PM, Reinette Chatre wrote:

Summary of considerations surrounding CLOSID/RMID (PARTID/PMG) assignment for kernel work
=========================================================================================

- PLZA currently only supports global assignment (only PLZA_EN of
   MSR_IA32_PQR_PLZA_ASSOC may differ on logical processors). Even so, current
   speculation is that RMID_EN=0 implies that user space RMID is used to monitor
   kernel work that could appear to user as "kernel mode" supporting multiple RMIDs.
   https://lore.kernel.org/lkml/abb049fa-3a3d-4601-9ae3-61eeb7fd8fcf@xxxxxxx/

Yes. RMID_EN=0 means dont use separate RMID for plza.

Thank you very much for confirming.

...

How can resctrl support the requirements?
=========================================

New global resctrl fs files
===========================
info/kernel_mode (always visible)
info/kernel_mode_assignment (visibility and content depends on active setting in info/kernel_mode)

Probably good idea to drop "assign" for this work. We already have mbm_assign mode and related work.

hmmm ... I think "assign" is generic enough of a word that
it cannot be claimed by a single feature.


info/kernel_mode_assoc or info/kernel_mode_association? Or We can wait later to rename appropriately.

yes, naming can be settled later.


Sure.




info/kernel_mode
================
- Displays the currently active as well as possible features available to user
   space.
- Single place where user can query "kernel mode" behavior and capabilities of the
   system.
- Some possible values:
   - inherit_ctrl_and_mon <=== previously named "match_user", just renamed for consistency with other names
      When active, kernel and user space use the same CLOSID/RMID. The current status
      quo for x86.
   - global_assign_ctrl_inherit_mon
      When active, CLOSID/control group can be assigned for *all* (hence, "global")
      kernel work while all kernel work uses same RMID as user space.
      Can only be supported on architecture where CLOSID and RMID are independent.
      An arch may support this in hardware (RMID_EN=0?) or this can be done by resctrl during
      context switch if the RMID is independent and the context switches cost is
      considered "reasonable".
      This supports use case https://lore.kernel.org/lkml/CABPqkBSq=cgn-am4qorA_VN0vsbpbfDePSi7gubicpROB1=djw@xxxxxxxxxxxxxx/
      for PLZA.
   - global_assign_ctrl_assign_mon
      When active the same resource group (CLOSID and RMID) can be assigned to
      *all* kernel work. This could be any group, including the default group.
      There may not be a use case for this but it could be useful as an intemediate
      step of the mode that follow (more later).
   - per_group_assign_ctrl_assign_mon
      When active every resource group can be associated with another (or the same)
      resource group. This association maps the resource group for user space work
      to resource group for kernel work. This is similar to the "kernel_group" idea
      presented in:
      https://lore.kernel.org/lkml/aYyxAPdTFejzsE42@xxxxxxxxxxxxxxx/
      This addresses use case https://lore.kernel.org/lkml/CABPqkBSq=cgn-am4qorA_VN0vsbpbfDePSi7gubicpROB1=djw@xxxxxxxxxxxxxx/
      for MPAM.

All these new names and related information will go in global structure.

Something like this..

Struct kern_mode {
       enum assoc_mode;
       struct rdtgroup *k_rdtgrp;
       ...
};

Not sure what other information will be required here. Will know once I stared working on it.

This structure will be updated based on user echo's in "kernel_mode" and "kernel_mode_assignment".

This looks to be a good start. I think keeping the rdtgroup association is good since
it helps to easily display the name to user space while also providing access to the CLOSID
and RMID that is assigned to the tasks.
By placing them in their own structure instead of just globals it does make it easier to
build on when some modes have different requirements wrt rdtgroup management.

I am not clear on this comment. Can you please elaborate little bit?

Thanks
Babu


You may encounter that certain arrangements work better to support interactions with the
task structure that are not clear at this time.




- Additional values can be added as new requirements arise, for example "per_task"
   assignment. Connecting visibility of info/kernel_mode_assignment to mode in
   info/kernel_mode enables resctrl to later support additional modes that may require
   different configuration files, potentially per-resource group like the "tasks_kernel"
   (or perhaps rather "kernel_mode_tasks" to have consistent prefix for this feature)
   and "cpus_kernel" ("kernel_mode_cpus"?) discussed in these threads.

So, per resource group file "kernel_mode_tasks" and "kernel_mode_cpus" are not required right now. Correct?

Correct. The way I see it the baseline implementation to support PLZA should be
straightforward. We'll probably spend a bit extra time on the supporting documentation
to pave the way for possible additions.

   User can view active and supported modes:

    # cat info/kernel_mode
    [inherit_ctrl_and_mon]
    global_assign_ctrl_inherit_mon
    global_assign_ctrl_assign_mon

User can switch modes:
    # echo global_assign_ctrl_inherit_mon > kernel_mode
    # cat kernel_mode
    inherit_ctrl_and_mon
    [global_assign_ctrl_inherit_mon]
    global_assign_ctrl_assign_mon


info/kernel_mode_assignment
===========================
- Visibility depends on active mode in info/kernel_mode.
- Content depends on active mode in info/kernel_mode
- Syntax to identify resource groups can use the syntax created as part of earlier ABMC work
   that supports default group https://lore.kernel.org/lkml/cover.1737577229.git.babu.moger@xxxxxxx/
- Default CTRL_MON group and if relevant, the default MON group, can be the default
   assignment when user just changes the kernel_mode without setting the assignment.

info/kernel_mode_assignment when mode is global_assign_ctrl_inherit_mon
-----------------------------------------------------------------------
- info/kernel_mode_assignment contains single value that is the name of the control group
   used for all kernel work.
- CLOSID/PARTID used for kernel work is determined from the control group assigned
- default value is default CTRL_MON group
- no monitor group assignment, kernel work inherits user space RMID
- syntax is
     <CTRL_MON group> with "/" meaning default.

info/kernel_mode_assignment when mode is global_assign_ctrl_assign_mon
-----------------------------------------------------------------------
- info/kernel_mode_assignment contains single value that is the name of the resource group
   used for all kernel work.
- Combined CLOSID/RMID or combined PARTID/PMG is set globally to be associated with all
   kernel work.
- default value is default CTRL_MON group
- syntax is
     <CTRL_MON group>/MON group>/ with "//" meaning default control and default monitoring group.

info/kernel_mode_assignment when mode is per_group_assign_ctrl_assign_mon
-------------------------------------------------------------------------
- this presents the information proposed in https://lore.kernel.org/lkml/aYyxAPdTFejzsE42@xxxxxxxxxxxxxxx/
   within a single file for convenience and potential optimization when user space needs to make changes.
   Interface proposed in https://lore.kernel.org/lkml/aYyxAPdTFejzsE42@xxxxxxxxxxxxxxx/ is also an option
   and as an alternative a per-resource group "kernel_group" can be made visible when user space enables
   this mode.
- info/kernel_mode_assignment contains a mapping of every resource group to another resource group:
   <resource group for user space work>:<resource group for kernel work>
- all resource groups must be present in first field of this file
- Even though this is a "per group" setting expectation is that this will set the
   kernel work CLOSID/RMID for every task. This implies that writing to this file would need
   to access the tasklist_lock that, when taking for too long, may impact other parts of system.
   See https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@xxxxxxxxxxxxxx/

This mode is currently not supported in AMD PLZA implementation. But we have to keep the options open for future enhancement for MPAM. I am still learning on MPM requirement.


Scenarios supported
===================

Default
-------
For x86 I understand kernel work and user work to be done with same CLOSID/RMID which
implies that info/kernel_mode can always be visible and at least display:
    # cat info/kernel_mode
    [inherit_ctrl_and_mon]

info/kernel_mode_assignment is not visible in this mode.

I understand MPAM may have different defaults here so would like to understand better.

Dedicated global allocations for kernel work, monitoring same for user space and kernel (PLZA)
----------------------------------------------------------------------------------------------
Possible scenario with PLZA, not MPAM (see later):
1. Create group(s) to manage allocations associated with user space work
    and assign tasks/CPUs to these groups.
2. Create group to manage allocations associated with all kernel work.
    - For example,
    # mkdir /sys/fs/resctrl/unthrottled
    - No constraints from resctrl fs on interactions with files in this group. From resctrl
      fs perspective it is not "dedicated" to kernel work but just another resource group.

That is correct. We dont need to handle the group special for kernel_mode while creating the group. However, there will some handling required when kernel_mode group is deleted. We need to move the tasks/cpus back to default group and update the global kernel_mode structure.

Good point, yes.


...

Dedicated global allocations for kernel work, monitoring same for user space and kernel (MPAM)
----------------------------------------------------------------------------------------------
1. User space creates resource and monitoring groups for user tasks:
      /sys/fs/resctrl <= User space default allocations
    /sys/fs/resctrl/g1 <= User space allocations g1
    /sys/fs/resctrl/g1/mon_groups/g1m1 <= User space monitoring group g1m1
    /sys/fs/resctrl/g1/mon_groups/g1m2 <= User space monitoring group g1m2
    /sys/fs/resctrl/g2 <= User space allocations g2
    /sys/fs/resctrl/g2/mon_groups/g2m1 <= User space monitoring group g2m1
    /sys/fs/resctrl/g2/mon_groups/g2m2 <= User space monitoring group g2m2

2. User space creates resource and monitoring groups for kernel work (system has two PMG):
    /sys/fs/resctrl/kernel <= Kernel space allocations
    /sys/fs/resctrl/kernel/mon_data               <= Kernel space monitoring for all of default and g1
    /sys/fs/resctrl/kernel/mon_groups/kernel_g2   <= Kernel space monitoring for all of g2
3. Set kernel mode to per_group_assign_ctrl_assign_mon:
    # echo per_group_assign_ctrl_assign_mon > info/kernel_mode
    - info/kernel_mode_assignment becomes visible and contains
    # cat info/kernel_mode_assignment
    //://
    g1//://
    g1/g1m1/://
    g1/g1m2/://
    g2//://
    g2/g2m1/://
    g2/g2m2/://
    - An optimization here may be to have the change to per_group_assign_ctrl_assign_mon mode be implemented
      similar to the change to global_assign_ctrl_assign_mon that initializes a global default. This can
      avoid keeping tasklist_lock for a long time to set all tasks' kernel CLOSID/RMID to default just for
      user space to likely change it.
4. Set groups to be used for kernel work:
    # echo '//:kernel//\ng1//:kernel//\ng1/g1m1/:kernel//\ng1/g1m2/:kernel//\ng2//:kernel/kernel_g2/\ng2/g2m1/:kernel/kernel_g2/\ng2/g2m2/:kernel/kernel_g2/\n' > info/kernel_mode_assignment


Currently, this is not supported in AMD's PLZA implimentation. But we need to keep this option open for MPAM.

Right. I expect PLZA to at least support "global_assign_ctrl_inherit_mon" mode
since that is the one we know somebody is waiting for. I am not actually sure about
"global_assign_ctrl_assign_mon" for PLZA. It is the variant intended to be implemented
by this RFC submission and does not seem difficult to implement but I have not really heard
any requests around it. Please do correct me if I missed anything here.


The interfaces proposed aim to maintain compatibility with existing user space tools while
adding support for all requirements expressed thus far in an efficient way. For an existing
user space tool there is no change in meaning of any existing file and no existing known
resource group files are made to disappear. There is a global configuration that lets user space
manage allocations without needing to check and configure each control group, even per-resource
group allocations can be managed from user space with a single read/write to support
making changes in most efficient way.

What do you think?


I will start planning this work. Feel free to add more details.
I Will have more questions as I start working on it.

I will separate GMBA work from this work.

Will send both series separately.

Thanks for details and summary.


Thank you very much.

Reinette