Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

From: Ben Horgan

Date: Tue Feb 24 2026 - 04:40:59 EST


Hi Reinette,

On 2/23/26 16:38, Reinette Chatre wrote:
> Hi Ben,
>
> On 2/23/26 2:08 AM, Ben Horgan wrote:
>> On 2/20/26 02:53, Reinette Chatre wrote:
>
> ...
>
>>> Dedicated global allocations for kernel work, monitoring same for user space and kernel (MPAM)
>>> ----------------------------------------------------------------------------------------------
>>> 1. User space creates resource and monitoring groups for user tasks:
>>> /sys/fs/resctrl <= User space default allocations
>>> /sys/fs/resctrl/g1 <= User space allocations g1
>>> /sys/fs/resctrl/g1/mon_groups/g1m1 <= User space monitoring group g1m1
>>> /sys/fs/resctrl/g1/mon_groups/g1m2 <= User space monitoring group g1m2
>>> /sys/fs/resctrl/g2 <= User space allocations g2
>>> /sys/fs/resctrl/g2/mon_groups/g2m1 <= User space monitoring group g2m1
>>> /sys/fs/resctrl/g2/mon_groups/g2m2 <= User space monitoring group g2m2
>>>
>>> 2. User space creates resource and monitoring groups for kernel work (system has two PMG):
>>> /sys/fs/resctrl/kernel <= Kernel space allocations
>>> /sys/fs/resctrl/kernel/mon_data <= Kernel space monitoring for all of default and g1
>>> /sys/fs/resctrl/kernel/mon_groups/kernel_g2 <= Kernel space monitoring for all of g2
>>> 3. Set kernel mode to per_group_assign_ctrl_assign_mon:
>>> # echo per_group_assign_ctrl_assign_mon > info/kernel_mode
>>> - info/kernel_mode_assignment becomes visible and contains
>>> # cat info/kernel_mode_assignment
>>> //://
>>> g1//://
>>> g1/g1m1/://
>>> g1/g1m2/://
>>> g2//://
>>> g2/g2m1/://
>>> g2/g2m2/://
>>> - An optimization here may be to have the change to per_group_assign_ctrl_assign_mon mode be implemented
>>> similar to the change to global_assign_ctrl_assign_mon that initializes a global default. This can
>>> avoid keeping tasklist_lock for a long time to set all tasks' kernel CLOSID/RMID to default just for
>>> user space to likely change it.
>>> 4. Set groups to be used for kernel work:
>>> # echo '//:kernel//\ng1//:kernel//\ng1/g1m1/:kernel//\ng1/g1m2/:kernel//\ng2//:kernel/kernel_g2/\ng2/g2m1/:kernel/kernel_g2/\ng2/g2m2/:kernel/kernel_g2/\n' > info/kernel_mode_assignment
>>
>> Am I right in thinking that you want this in the info directory to avoid
>> adding files to the CTRL_MON/MON groups?
>
> I see this file as providing the same capability as you suggested in
> https://lore.kernel.org/lkml/aYyxAPdTFejzsE42@xxxxxxxxxxxxxxx/. The reason why I
> presented this as a single file is not because I am trying to avoid adding
> files to the CTRL_MON/MON groups but because I believe such interface enables
> resctrl to have more flexibility and support more scenarios for optimization.
>
> As you mentioned in your proposal the solution enables a single write to move
> a task. As I thought through what resctrl needs to do on such write I saw a lot
> of similarities with mongrp_reparent() that loops through all the tasks via
> for_each_process_thread() while holding tasklist_lock. Issues with mongrp_reparent()
> holding tasklist_lock for a long time are described in [1].
>
> While the single file does not avoid taking tasklist_lock it does give the user the
> ability to set kernel group for multiple user groups with a single write. When user space
> does so I believe it is possible for resctrl to have an optimization that takes tasklist_lock
> just once and make changes to tasks belonging to all groups while looping through all tasks on
> system just once. With files within the CTRL_MON/MON groups setting kernel group for
> multiple user groups will require multiple writes from user space where each write requires
> looping through tasks while holding tasklist_lock during each loop. From what I learned
> from [1] something like this can be very disruptive to the rest of the system.
>
> In summary, I see having this single file provide the same capability as the
> on-file-per-CTRL_MON/MON group since user can choose to set kernel group for user
> group one at a time but it also gives more flexibility to resctrl for optimization.
>
> Nothing is set in stone here. There is still flexibility in this proposal to support
> PARTID and PMG assignment with a single file in each CTRL_MON/MON group if we find that
> it has the more benefits. resctrl can still expose a "per_group_assign_ctrl_assign_mon" mode
> but instead of making "info/kernel_mode_assignment" visible when it is enabled the control file
> in CTRL_MON/MON groups are made visible ... even in this case resctrl could still add the single
> file later if deemed necessary at that time.
>
> Considering all this, do you think resctrl should rather start with a file in each
> CTRL_MON/MON group?

>From what you say, it sounds like the optimization opportunities granted
by having a single file will be necessary with some usage patterns and
so I'd be happy to start with just the single
"info/kernel_mode_assignment" file. It does mean that you need to
consider more than the current CTRL_MON directory when reading or
writing configuration but I don't see any real problem there.

>
> Reinette
>
> [1] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@xxxxxxxxxxxxxx/

Thanks,

Ben