Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

From: Ben Horgan

Date: Wed Feb 18 2026 - 04:35:43 EST


Hi Stephane,

On 2/18/26 06:22, Stephane Eranian wrote:
> On Tue, Feb 17, 2026 at 7:56 AM Ben Horgan <ben.horgan@xxxxxxx> wrote:
>>
>> Hi Babu,
>>
>> On 2/16/26 22:52, Moger, Babu wrote:
>>> Hi Ben,
>>>
>>> On 2/16/2026 9:41 AM, Ben Horgan wrote:
>>>> Hi Babu, Reinette,
>>>>
>>>> On 2/14/26 00:10, Reinette Chatre wrote:
>>>>> Hi Babu,
>>>>>
>>>>> On 2/13/26 8:37 AM, Moger, Babu wrote:
>>>>>> Hi Reinette,
>>>>>>
>>>>>> On 2/10/2026 10:17 AM, Reinette Chatre wrote:
>>>>>>> Hi Babu,
>>>>>>>
>>>>>>> On 1/28/26 9:44 AM, Moger, Babu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/28/2026 11:41 AM, Moger, Babu wrote:
>>>>>>>>>> On Wed, Jan 28, 2026 at 10:01:39AM -0600, Moger, Babu wrote:
>>>>>>>>>>> On 1/27/2026 4:30 PM, Luck, Tony wrote:
>>>>>>>>>> Babu,
>>>>>>>>>>
>>>>>>>>>> I've read a bit more of the code now and I think I understand more.
>>>>>>>>>>
>>>>>>>>>> Some useful additions to your explanation.
>>>>>>>>>>
>>>>>>>>>> 1) Only one CTRL group can be marked as PLZA
>>>>>>>>>
>>>>>>>>> Yes. Correct.
>>>>>>>
>>>>>>> Why limit it to one CTRL_MON group and why not support it for MON
>>>>>>> groups?
>>>>>>
>>>>>> There can be only one PLZA configuration in a system. The values in
>>>>>> the MSR_IA32_PQR_PLZA_ASSOC register (RMID, RMID_EN, CLOSID,
>>>>>> CLOSID_EN) must be identical across all logical processors. The only
>>>>>> field that may differ is PLZA_EN.
>>>>
>>>> Does this have any effect on hypervisors?
>>>
>>> Because hypervisor runs at CPL0, there could be some use case. I have
>>> not completely understood that part.
>>>
>>>>
>>>>>
>>>>> ah - this is a significant part that I missed. Since this is a per-
>>>>> CPU register it seems
>>>>
>>>> I also missed that.
>>>>
>>>>> to have the ability for expanded use in the future where different
>>>>> CLOSID and RMID may be
>>>>> written to it? Is PLZA leaving room for such future enhancement or
>>>>> does the spec contain
>>>>> the text that state "The values in the MSR_IA32_PQR_PLZA_ASSOC
>>>>> register (RMID, RMID_EN,
>>>>> CLOSID, CLOSID_EN) must be identical across all logical processors."?
>>>>> That is, "forever
>>>>> and always"?
>>>>>
>>>>> If I understand correctly MPAM could have different PARTID and PMG
>>>>> for kernel use so we
>>>>> need to consider these different architectural behaviors.
>>>>
>>>> Yes, MPAM has a per-cpu register MPAM1_EL1.
>>>>
>>>
>>> oh ok.
>>>
>>>>>
>>>>>> I was initially unsure which RMID should be used when PLZA is
>>>>>> enabled on MON groups.
>>>>>>
>>>>>> After re-evaluating, enabling PLZA on MON groups is still feasible:
>>>>>>
>>>>>> 1. Only one group in the system can have PLZA enabled.
>>>>>> 2. If PLZA is enabled on CTRL_MON group then we cannot enable PLZA
>>>>>> on MON group.
>>>>>> 3. If PLZA is enabled on the CTRL_MON group, then the CLOSID and
>>>>>> RMID of the CTRL_MON group can be written.
>>>>>> 4. If PLZA is enabled on a MON group, then the CLOSID of the
>>>>>> CTRL_MON group can be used, while the RMID of the MON group can be
>>>>>> written.
>>>>
>>>> Given that CLOSID and RMID are fixed once in the PLZA configuration
>>>> could this be simplified by just assuming they have the values of the
>>>> default group, CLOSID=0 and RMID=0 and let the user base there
>>>> configuration on that?
>>>>
>>>
>>> I didn't understand this question. There are 16 CLOSIDs and 1024 RMIDs.
>>> We can use any one of these to enable PLZA. It is not fixed in that sense.
>>
>> Sorry, I wasn't clear. What I'm trying to understand is what you gain by
>> this flexibility. Given that the values CLOSID and the RMID are just
>> identifiers within the hardware and have only the meaning they are given
>> by the grouping and controls/monitors set up by resctrl (or any other
>> software interface) would you lose anything by just saying the PLZA
>> group has CLOSID=0 and RMID=0. Is there value in changing the PLZA
>> CLOSID and RMID or can the same effect happen by just changing the
>> resctrl configuration?
>>
> Not quite.
> When you enter the kernel, you want to run unthrottled to avoid
> priority inversion situations.
> But at the same time, you still want to be able to monitor the
> bandwidth for your thread or job, i..e, keep the same
> RMID you have in user space.

Thanks for sharing your usecase.

>
> The kernel is by construction shared by all threads running in the
> system. It should run unrestricted or with the
> bandwidth allocated to the highest priority tasks.
>
> PLZA should not change the RMID at all.

Would the above with RMID_EN=0 give you this usecase?

Unfortunately, this isn't possible when rmid/pmg is scoped to
closid/partid as is the case in MPAM, i.e. the monitors require a match
on the pair (closid, partid). Hence, I think we need to support the case
where both RMID and CLOSID change.

>
> You could obtain the same effect by changing the quote for each CLOSID
> entering the kernel. But that would likely be more expensive
> and you would have to do this for every possible entry and exit point
> (restore on exit).
>
>
>
>> I was also wondering if using the default group this way would mean that
>> you wouldn't need to reserve the group for only kernel use.
>>
>>>
>>>
>>>>>>
>>>>>> I am thinking this approach should work.
>>>>>>
>>>>>>>
>>>>>>> Limiting it to a single CTRL group seems restrictive in a few ways:
>>>>>>> 1) It requires that the "PLZA" group has a dedicated CLOSID. This
>>>>>>> reduces the
>>>>>>> number of use cases that can be supported. Consider, for
>>>>>>> example, an existing
>>>>>>> "high priority" resource group and a "low priority" resource
>>>>>>> group. The user may
>>>>>>> just want to let the tasks in the "low priority" resource
>>>>>>> group run as "high priority"
>>>>>>> when in CPL0. This of course may depend on what resources are
>>>>>>> allocated, for example
>>>>>>> cache may need more care, but if, for example, user is only
>>>>>>> interested in memory
>>>>>>> bandwidth allocation this seems a reasonable use case?
>>>>>>> 2) Similar to what Tony [1] mentioned this does not enable what the
>>>>>>> hardware is
>>>>>>> capable of in terms of number of different control groups/
>>>>>>> CLOSID that can be
>>>>>>> assigned to MSR_IA32_PQR_PLZA_ASSOC. Why limit PLZA to one
>>>>>>> CLOSID?
>>>>>>> 3) The feature seems to support RMID in MSR_IA32_PQR_PLZA_ASSOC
>>>>>>> similar to
>>>>>>> MSR_IA32_PQR_ASSOC. With this, it should be possible for user
>>>>>>> space to, for
>>>>>>> example, create a resource group that contains tasks of
>>>>>>> interest and create
>>>>>>> a monitor group within it that monitors all tasks' bandwidth
>>>>>>> usage when in CPL0.
>>>>>>> This will give user space better insight into system behavior
>>>>>>> and from what I can
>>>>>>> tell is supported by the feature but not enabled?
>>>>>>
>>>>>>
>>>>>> Yes, as long as PLZA is enabled on only one group in the entire system
>>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>>> 2) It can't be the root/default group
>>>>>>>>>
>>>>>>>>> This is something I added to keep the default group in a un-
>>>>>>>>> disturbed,
>>>>>>>
>>>>>>> Why was this needed?
>>>>>>>
>>>>>>
>>>>>> With the new approach mentioned about we can enable in default group
>>>>>> also.
>>>>>>
>>>>>>>>>
>>>>>>>>>> 3) It can't have sub monitor groups
>>>>>>>
>>>>>>> Why not?
>>>>>>
>>>>>> Ditto. With the new approach mentioned about we can enable in
>>>>>> default group also.
>>>>>>
>>>>>>>
>>>>>>>>>> 4) It can't be pseudo-locked
>>>>>>>>>
>>>>>>>>> Yes.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Would a potential use case involve putting *all* tasks into the
>>>>>>>>>> PLZA group? That
>>>>>>>>>> would avoid any additional context switch overhead as the PLZA
>>>>>>>>>> MSR would never
>>>>>>>>>> need to change.
>>>>>>>>>
>>>>>>>>> Yes. That can be one use case.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> If that is the case, maybe for the PLZA group we should allow
>>>>>>>>>> user to
>>>>>>>>>> do:
>>>>>>>>>>
>>>>>>>>>> # echo '*' > tasks
>>>>>>>
>>>>>>> Dedicating a resource group to "PLZA" seems restrictive while also
>>>>>>> adding many
>>>>>>> complications since this designation makes resource group behave
>>>>>>> differently and
>>>>>>> thus the files need to get extra "treatments" to handle this "PLZA"
>>>>>>> designation.
>>>>>>>
>>>>>>> I am wondering if it will not be simpler to introduce just one new
>>>>>>> file, for example
>>>>>>> "tasks_cpl0" in both CTRL_MON and MON groups. When user space
>>>>>>> writes a task ID to the
>>>>>>> file it "enables" PLZA for this task and that group's CLOSID and
>>>>>>> RMID is the associated
>>>>>>> task's "PLZA" CLOSID and RMID. This gives user space the
>>>>>>> flexibility to use the same
>>>>>>> resource group to manage user space and kernel space allocations
>>>>>>> while also supporting
>>>>>>> various monitoring use cases. This still supports the "dedicate a
>>>>>>> resource group to PLZA"
>>>>>>> use case where user space can create a new resource group with
>>>>>>> certain allocations but the
>>>>>>> "tasks" file will be empty and "tasks_cpl0" contains the tasks
>>>>>>> needing to run with
>>>>>>> the resource group's allocations when in CPL0.
>>>>>>
>>>>>> Yes. We should be able do that. We need both tasks_cpl0 and cpus_cpl0.
>>>>>>
>>>>>> We need make sure only one group can configured in the system and
>>>>>> not allow in other groups when it is already enabled.
>>>>>
>>>>> As I understand this means that only one group can have content in its
>>>>> tasks_cpl0/tasks_kernel file. There should not be any special
>>>>> handling for
>>>>> the remaining files of the resource group since the resource group is
>>>>> not
>>>>> dedicated to kernel work and can be used as a user space resource
>>>>> group also.
>>>>> If user space wants to create a dedicated kernel resource group there
>>>>> can be
>>>>> a new resource group with an empty tasks file.
>>>>>
>>>>> hmmm ... but if user space writes a task ID to a tasks_cpl0/
>>>>> tasks_kernel file then
>>>>> resctrl would need to create new syntax to remove that task ID.
>>>>>
>>>>> Possibly MPAM can build on this by allowing user space to write to
>>>>> multiple
>>>>> tasks_cpl0/tasks_kernel files? (and the next version of PLZA may too)
>>>>>
>>>>> Reinette
>>>>>
>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Babu
>>>>>>
>>>>>>>
>>>>>>> Reinette
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/lkml/aXpgragcLS2L8ROe@agluck-desk3/
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>
>>
>> Thanks,
>>
>> Ben
>>

Thanks,

Ben