Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

From: Reinette Chatre

Date: Tue Feb 24 2026 - 11:17:53 EST


Hi Ben,

On 2/24/26 1:36 AM, Ben Horgan wrote:
> Hi Reinette,
>
> On 2/23/26 16:38, Reinette Chatre wrote:
>> Hi Ben,
>>
>> On 2/23/26 2:08 AM, Ben Horgan wrote:
>>> On 2/20/26 02:53, Reinette Chatre wrote:
>>
>> ...
>>
>>>> Dedicated global allocations for kernel work, monitoring same for user space and kernel (MPAM)
>>>> ----------------------------------------------------------------------------------------------
>>>> 1. User space creates resource and monitoring groups for user tasks:
>>>> /sys/fs/resctrl <= User space default allocations
>>>> /sys/fs/resctrl/g1 <= User space allocations g1
>>>> /sys/fs/resctrl/g1/mon_groups/g1m1 <= User space monitoring group g1m1
>>>> /sys/fs/resctrl/g1/mon_groups/g1m2 <= User space monitoring group g1m2
>>>> /sys/fs/resctrl/g2 <= User space allocations g2
>>>> /sys/fs/resctrl/g2/mon_groups/g2m1 <= User space monitoring group g2m1
>>>> /sys/fs/resctrl/g2/mon_groups/g2m2 <= User space monitoring group g2m2
>>>>
>>>> 2. User space creates resource and monitoring groups for kernel work (system has two PMG):
>>>> /sys/fs/resctrl/kernel <= Kernel space allocations
>>>> /sys/fs/resctrl/kernel/mon_data <= Kernel space monitoring for all of default and g1
>>>> /sys/fs/resctrl/kernel/mon_groups/kernel_g2 <= Kernel space monitoring for all of g2
>>>> 3. Set kernel mode to per_group_assign_ctrl_assign_mon:
>>>> # echo per_group_assign_ctrl_assign_mon > info/kernel_mode
>>>> - info/kernel_mode_assignment becomes visible and contains
>>>> # cat info/kernel_mode_assignment
>>>> //://
>>>> g1//://
>>>> g1/g1m1/://
>>>> g1/g1m2/://
>>>> g2//://
>>>> g2/g2m1/://
>>>> g2/g2m2/://
>>>> - An optimization here may be to have the change to per_group_assign_ctrl_assign_mon mode be implemented
>>>> similar to the change to global_assign_ctrl_assign_mon that initializes a global default. This can
>>>> avoid keeping tasklist_lock for a long time to set all tasks' kernel CLOSID/RMID to default just for
>>>> user space to likely change it.
>>>> 4. Set groups to be used for kernel work:
>>>> # echo '//:kernel//\ng1//:kernel//\ng1/g1m1/:kernel//\ng1/g1m2/:kernel//\ng2//:kernel/kernel_g2/\ng2/g2m1/:kernel/kernel_g2/\ng2/g2m2/:kernel/kernel_g2/\n' > info/kernel_mode_assignment
>>>
>>> Am I right in thinking that you want this in the info directory to avoid
>>> adding files to the CTRL_MON/MON groups?
>>
>> I see this file as providing the same capability as you suggested in
>> https://lore.kernel.org/lkml/aYyxAPdTFejzsE42@xxxxxxxxxxxxxxx/. The reason why I
>> presented this as a single file is not because I am trying to avoid adding
>> files to the CTRL_MON/MON groups but because I believe such interface enables
>> resctrl to have more flexibility and support more scenarios for optimization.
>>
>> As you mentioned in your proposal the solution enables a single write to move
>> a task. As I thought through what resctrl needs to do on such write I saw a lot
>> of similarities with mongrp_reparent() that loops through all the tasks via
>> for_each_process_thread() while holding tasklist_lock. Issues with mongrp_reparent()
>> holding tasklist_lock for a long time are described in [1].
>>
>> While the single file does not avoid taking tasklist_lock it does give the user the
>> ability to set kernel group for multiple user groups with a single write. When user space
>> does so I believe it is possible for resctrl to have an optimization that takes tasklist_lock
>> just once and make changes to tasks belonging to all groups while looping through all tasks on
>> system just once. With files within the CTRL_MON/MON groups setting kernel group for
>> multiple user groups will require multiple writes from user space where each write requires
>> looping through tasks while holding tasklist_lock during each loop. From what I learned
>> from [1] something like this can be very disruptive to the rest of the system.
>>
>> In summary, I see having this single file provide the same capability as the
>> on-file-per-CTRL_MON/MON group since user can choose to set kernel group for user
>> group one at a time but it also gives more flexibility to resctrl for optimization.
>>
>> Nothing is set in stone here. There is still flexibility in this proposal to support
>> PARTID and PMG assignment with a single file in each CTRL_MON/MON group if we find that
>> it has the more benefits. resctrl can still expose a "per_group_assign_ctrl_assign_mon" mode
>> but instead of making "info/kernel_mode_assignment" visible when it is enabled the control file
>> in CTRL_MON/MON groups are made visible ... even in this case resctrl could still add the single
>> file later if deemed necessary at that time.
>>
>> Considering all this, do you think resctrl should rather start with a file in each
>> CTRL_MON/MON group?
>
> From what you say, it sounds like the optimization opportunities granted
> by having a single file will be necessary with some usage patterns and
> so I'd be happy to start with just the single
> "info/kernel_mode_assignment" file. It does mean that you need to
> consider more than the current CTRL_MON directory when reading or
> writing configuration but I don't see any real problem there.

When reading the global file it will display all groups, yes. Writing configuration
need only modify the group(s) needing to be modified (similar to schemata file).

Babu and I did speculate a bit on other interactions with "info/kernel_mode_assignment"
in https://lore.kernel.org/lkml/0645bba3-6121-41d4-b627-323faf1089b7@xxxxxxxxx/ and
resctrl may need to adjust how a task's group membership is managed. resctrl could cache
some state or manage task membership differently entirely like
what Peter proposed in https://lore.kernel.org/lkml/20240325172707.73966-1-peternewman@xxxxxxxxxx/

If task group membership management becomes "cheap" then resctrl interface can be
reconsidered.

Reinette

>
>>
>> Reinette
>>
>> [1] https://lore.kernel.org/lkml/CALPaoCh0SbG1+VbbgcxjubE7Cc2Pb6QqhG3NH6X=WwsNfqNjtA@xxxxxxxxxxxxxx/
>
> Thanks,
>
> Ben
>