Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

From: Reinette Chatre

Date: Tue Feb 17 2026 - 13:51:59 EST

Hi Ben,

On 2/16/26 7:18 AM, Ben Horgan wrote:
> On Thu, Feb 12, 2026 at 10:37:21AM -0800, Reinette Chatre wrote:
>> On 2/12/26 5:55 AM, Ben Horgan wrote:
>>> On Wed, Feb 11, 2026 at 02:22:55PM -0800, Reinette Chatre wrote:
>>>> On 2/11/26 8:40 AM, Ben Horgan wrote:
>>>>> On Tue, Feb 10, 2026 at 10:04:48AM -0800, Reinette Chatre wrote:

>>>>>> It looks like MPAM has a few more capabilities here and the Arm levels are numbered differently
>>>>>> with EL0 meaning user space. We should thus aim to keep things as generic as possible. For example,
>>>>>> instead of CPL0 using something like "kernel" or ... ?
>>>>>
>>>>> Yes, PLZA does open up more possibilities for MPAM usage. I've talked to James
>>>>> internally and here are a few thoughts.
>>>>>
>>>>> If the user case is just that an option run all tasks with the same closid/rmid
>>>>> (partid/pmg) configuration when they are running in the kernel then I'd favour a
>>>>> mount option. The resctrl filesytem interface doesn't need to change and
>>>>
>>>> I view mount options as an interface of last resort. Why would a mount option be needed
>>>> in this case? The existence of the file used to configure the feature seems sufficient?
>>>
>>> If we are taking away a closid from the user then the number of CTRL_MON groups
>>> that can be created changes. It seems reasonable for user-space to expect
>>> num_closid to be a fixed value.
>>
>> I do you see why we need to take away a CLOSID from the user. Consider a user space that
>
> Yes, just slightly simpler to take away a CLOSID but could just go with the
> default CLOSID is also used for the kernel. I would be ok with a file saying the
> mode, like the mbm_event file does for counter assignment. It slightly misleading
> that a configuration file is under info but necessary as we don't have another
> location global to the resctrl mount.

Indeed, the "info" directory has evolved more into a "config" directory.

>> runs with just two resource groups, for example, "high priority" and "low priority", it seems
>> reasonable to make it possible to let the "low priority" tasks run with "high priority"
>> allocations when in kernel space without needing to dedicate a new CLOSID? More reasonable
>> when only considering memory bandwidth allocation though.
>>
>>>
>>>>
>>>> Also ...
>>>>
>>>> I do not think resctrl should unnecessarily place constraints on what the hardware
>>>> features are capable of. As I understand, both PLZA and MPAM supports use case where
>>>> tasks may use different CLOSID/RMID (PARTID/PMG) when running in the kernel. Limiting
>>>> this to only one CLOSID/PARTID seems like an unmotivated constraint to me at the moment.
>>>> This may be because I am not familiar with all the requirements here so please do
>>>> help with insight on how the hardware feature is intended to be used as it relates
>>>> to its design.
>>>>
>>>> We have to be very careful when constraining a feature this much If resctrl does something
>>>> like this it essentially restricts what users could do forever.
>>>
>>> Indeed, we don't want to unnecessarily restrict ourselves here. I was hoping a
>>> fixed kernel CLOSID/RMID configuration option might just give all we need for
>>> usecases we know we have and be minimally intrusive enough to not preclude a
>>> more featureful PLZA later when new usecases come about.
>>
>> Having ability to grow features would be ideal. I do not see how a fixed kernel CLOSID/RMID
>> configuration leaves room to build on top though. Could you please elaborate?
>
> If we initially go with a single new configuration file, e.g. kernel_mode, which
> could be "match_user" or "use_root, this would be the only initial change to the
> interface needed. If more usecases present themselves a new mode could be added,
> e.g. "configurable", and an interface to actually change the rmid/closid for the
> kernel could be added.

Something like this could be a base to work from. I think only the two ("match_user" and
"use_root") are a bit limiting for even the initial implementation though.
As I understand, "use_root" implies using the allocations of the default group but
does not indicate what MON group (which RMID/PMG) should be used to monitor the
work done in kernel space. A way to specify the actual group may be needed?

>> I wonder if the benefit of the fixed CLOSID/RMID is perhaps mostly in the cost of
>> context switching which I do not think is a concern for MPAM but it may be for PLZA?
>>
>> One option to support fixed kernel CLOSID/RMID at the beginning and leave room to build
>> may be to create the kernel_group or "tasks_kernel" interface as a baseline but in first
>> implementation only allow user space to write the same group to all "kernel_group" files or
>> to only allow to write to one of the "tasks_kernel" files in the resctrl fs hierarchy. At
>> that time the associated CLOSID/RMID would become the "fixed configuration" and attempts to
>> write to others can return "ENOSPC"?
>
> I think we'd have to be sure of the final interface if we go this way.

I do not think we should aim to know the final interface since that requires knowing all future
hardware features and their implementations in advance. Instead we should aim to have something
that we can build on that is accompanied by documentation that supports future flexibility (some may
refer to this as "weasel words").

>> From what I can tell this still does not require to take away a CLOSID/RMID from user space
>> though. Dedicating a CLOSID/RMID to kernel work can still be done but be in control of user
>> that can, for example leave the "tasks" and "cpus" files empty.
>>
>>> One complication is that for fixed kernel CLOSID/RMID option is that for x86 you
>>> may want to be able to monitor a tasks resource usage whether or not it is in
>>> the kernel or userspace and so only have a fixed CLOSID. However, for MPAM this
>>> wouldn't work as PMG (~RMID) is scoped to PARTID (~CLOSID).
>>>
>>>>
>>>>> userspace software doesn't need to change. This could either take away a
>>>>> closid/rmid from userspace and dedicate it to the kernel or perhaps have a
>>>>> policy to have the default group as the kernel group. If you use the default
>>>>
>>>> Similar to above I do not see PLZA or MPAM preventing sharing of CLOSID/RMID (PARTID/PMG)
>>>> between user space and kernel. I do not see a motivation for resctrl to place such
>>>> constraint.
>>>>
>>>>> configuration, at least for MPAM, the kernel may not be running at the highest
>>>>> priority as a minimum bandwidth can be used to give a priority boost. (Once we
>>>>> have a resctrl schema for this.)
>>>>>
>>>>> It could be useful to have something a bit more featureful though. Is there a
>>>>> need for the two mappings, task->cpl0 config and task->cpl1 to be independent or
>>>>> would as task->(cp0 config, cp1 config) be sufficient? It seems awkward that
>>>>> it's not a single write to move a task. If a single mapping is sufficient, then
>>>>
>>>> Moving a task in x86 is currently two writes by writing the CLOSID and RMID separately.
>>>> I think the MPAM approach is better and there may be opportunity to do this in a similar
>>>> way and both architectures use the same field(s) in the task_struct.
>>>
>>> I was referring to the userspace file write but unifying on a the same fields in
>>> task_struct could be good. The single write is necessary for MPAM as PMG is
>>> scoped to PARTID and I don't think x86 behaviour changes if it moves to the same
>>> approach.
>>>
>>
>> ah - I misunderstood. You are suggesting to have one file that user writes to
>> to set both user space and kernel space CLOSID/RMID? This sounds like what the
>
> Yes, the kernel_groups idea does partially have this as once you've set the
> kernel_group for a CTRL_MON or MON group then the user space configuration
> dictates the kernel space configuration. As you pointed out, this is also
> a draw back of the kernel_groups idea.
>
>> existing "tasks" file does but only supports the same CLOSID/RMID for both user
>> space and kernel space. To support the new hardware features where the CLOSID/RMID
>> can be different we cannot just change "tasks" interface and would need to keep it
>> backward compatible. So far I assumed that it would be ok for the "tasks" file
>> to essentially get new meaning as the CLOSID/RMID for just user space work, which
>> seems to require a second file for kernel space as a consequence? So far I have
>> not seen an option that does not change meaning of the "tasks" file.
>
> Would it make sense to have some new type of entries in the tasks file,
> e.g. k_ctrl_<pid>, k_mon_<pid> to say, in the kernel, use the closid of this
> CTRL_MON for this task pid or use the rmid of this CTRL_MON/MON group for this task
> pid? We would still probably need separate files for the cpu configuration.

I am obligated to nack such a change to the tasks file since it would impact any
existing user space parsing of this file.

>
> If separate files make more sense, then we might need 2 extra tasks files to
> decouple closid and rmid, e.g. tasks_k_ctrl and task_k_mon. The task_k_mon would
> be in all CTRL_MON and MON groups and determine the rmid and tasks_k_ctrl just
> in a CTRL_MON group and determine a closid.

This is possible, yes.

>>>>> as single new file, kernel_group,per CTRL_MON group (maybe MON groups) as
>>>>> suggested above but rather than a task that file could hold a path to the
>>>>> CTRL_MON/MON group that provides the kernel configuraion for tasks running in
>>>>> that group. So that this can be transparent to existing software an empty string
>>>>
>>>> Something like this would force all tasks of a group to run with the same CLOSID/RMID
>>>> (PARTID/PMG) when in kernel space. This seems to restrict what the hardware supports
>>>> and may reduce the possible use case of this feature.
>>>>
>>>> For example,
>>>> - There may be a scenario where there is a set of tasks with a particular allocation
>>>> when running in user space but when in kernel these tasks benefit from different
>>>> allocations. Consider for example below arrangement where tasks 1, 2, and 3 run in
>>>> user space with allocations from resource_groupA. While these tasks are ok with this
>>>> allocation when in user space they have different requirements when it comes to
>>>> kernel space. There may be a resource_groupB that allocates a lot of resources ("high
>>>> priority") that task 1 should use for kernel work and a resource_groupC that allocates
>>>> fewer resources that tasks 2 and 3 should use for kernel work ("medium priority").
>>>>
>>>> resource_groupA:
>>>> schemata: <average allocations that work for tasks 1, 2, and 3 when in user space>
>>>> tasks when in user space: 1, 2, 3
>>>>
>>>> resource_groupB:
>>>> schemata: <high priority allocations>
>>>> tasks when in kernel space: 1
>>>>
>>>> resource_groupC:
>>>> schemata: <medium priority allocations>
>>>> tasks when in kernel space: 2, 3
>>>
>>> I'm not sure if this would happen in the real world or not.
>>
>> Ack. I would like to echo Tony's request for feedback from resctrl users
>> https://lore.kernel.org/lkml/aYzcpuG0PfUaTdqt@agluck-desk3/
>
> Indeed. This is all getting a bit complicated.
>

ack

Reinette