Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

From: Stephane Eranian

Date: Wed Feb 18 2026 - 01:23:10 EST


On Tue, Feb 17, 2026 at 7:56 AM Ben Horgan <ben.horgan@xxxxxxx> wrote:
>
> Hi Babu,
>
> On 2/16/26 22:52, Moger, Babu wrote:
> > Hi Ben,
> >
> > On 2/16/2026 9:41 AM, Ben Horgan wrote:
> >> Hi Babu, Reinette,
> >>
> >> On 2/14/26 00:10, Reinette Chatre wrote:
> >>> Hi Babu,
> >>>
> >>> On 2/13/26 8:37 AM, Moger, Babu wrote:
> >>>> Hi Reinette,
> >>>>
> >>>> On 2/10/2026 10:17 AM, Reinette Chatre wrote:
> >>>>> Hi Babu,
> >>>>>
> >>>>> On 1/28/26 9:44 AM, Moger, Babu wrote:
> >>>>>>
> >>>>>>
> >>>>>> On 1/28/2026 11:41 AM, Moger, Babu wrote:
> >>>>>>>> On Wed, Jan 28, 2026 at 10:01:39AM -0600, Moger, Babu wrote:
> >>>>>>>>> On 1/27/2026 4:30 PM, Luck, Tony wrote:
> >>>>>>>> Babu,
> >>>>>>>>
> >>>>>>>> I've read a bit more of the code now and I think I understand more.
> >>>>>>>>
> >>>>>>>> Some useful additions to your explanation.
> >>>>>>>>
> >>>>>>>> 1) Only one CTRL group can be marked as PLZA
> >>>>>>>
> >>>>>>> Yes. Correct.
> >>>>>
> >>>>> Why limit it to one CTRL_MON group and why not support it for MON
> >>>>> groups?
> >>>>
> >>>> There can be only one PLZA configuration in a system. The values in
> >>>> the MSR_IA32_PQR_PLZA_ASSOC register (RMID, RMID_EN, CLOSID,
> >>>> CLOSID_EN) must be identical across all logical processors. The only
> >>>> field that may differ is PLZA_EN.
> >>
> >> Does this have any effect on hypervisors?
> >
> > Because hypervisor runs at CPL0, there could be some use case. I have
> > not completely understood that part.
> >
> >>
> >>>
> >>> ah - this is a significant part that I missed. Since this is a per-
> >>> CPU register it seems
> >>
> >> I also missed that.
> >>
> >>> to have the ability for expanded use in the future where different
> >>> CLOSID and RMID may be
> >>> written to it? Is PLZA leaving room for such future enhancement or
> >>> does the spec contain
> >>> the text that state "The values in the MSR_IA32_PQR_PLZA_ASSOC
> >>> register (RMID, RMID_EN,
> >>> CLOSID, CLOSID_EN) must be identical across all logical processors."?
> >>> That is, "forever
> >>> and always"?
> >>>
> >>> If I understand correctly MPAM could have different PARTID and PMG
> >>> for kernel use so we
> >>> need to consider these different architectural behaviors.
> >>
> >> Yes, MPAM has a per-cpu register MPAM1_EL1.
> >>
> >
> > oh ok.
> >
> >>>
> >>>> I was initially unsure which RMID should be used when PLZA is
> >>>> enabled on MON groups.
> >>>>
> >>>> After re-evaluating, enabling PLZA on MON groups is still feasible:
> >>>>
> >>>> 1. Only one group in the system can have PLZA enabled.
> >>>> 2. If PLZA is enabled on CTRL_MON group then we cannot enable PLZA
> >>>> on MON group.
> >>>> 3. If PLZA is enabled on the CTRL_MON group, then the CLOSID and
> >>>> RMID of the CTRL_MON group can be written.
> >>>> 4. If PLZA is enabled on a MON group, then the CLOSID of the
> >>>> CTRL_MON group can be used, while the RMID of the MON group can be
> >>>> written.
> >>
> >> Given that CLOSID and RMID are fixed once in the PLZA configuration
> >> could this be simplified by just assuming they have the values of the
> >> default group, CLOSID=0 and RMID=0 and let the user base there
> >> configuration on that?
> >>
> >
> > I didn't understand this question. There are 16 CLOSIDs and 1024 RMIDs.
> > We can use any one of these to enable PLZA. It is not fixed in that sense.
>
> Sorry, I wasn't clear. What I'm trying to understand is what you gain by
> this flexibility. Given that the values CLOSID and the RMID are just
> identifiers within the hardware and have only the meaning they are given
> by the grouping and controls/monitors set up by resctrl (or any other
> software interface) would you lose anything by just saying the PLZA
> group has CLOSID=0 and RMID=0. Is there value in changing the PLZA
> CLOSID and RMID or can the same effect happen by just changing the
> resctrl configuration?
>
Not quite.
When you enter the kernel, you want to run unthrottled to avoid
priority inversion situations.
But at the same time, you still want to be able to monitor the
bandwidth for your thread or job, i..e, keep the same
RMID you have in user space.

The kernel is by construction shared by all threads running in the
system. It should run unrestricted or with the
bandwidth allocated to the highest priority tasks.

PLZA should not change the RMID at all.

You could obtain the same effect by changing the quote for each CLOSID
entering the kernel. But that would likely be more expensive
and you would have to do this for every possible entry and exit point
(restore on exit).



> I was also wondering if using the default group this way would mean that
> you wouldn't need to reserve the group for only kernel use.
>
> >
> >
> >>>>
> >>>> I am thinking this approach should work.
> >>>>
> >>>>>
> >>>>> Limiting it to a single CTRL group seems restrictive in a few ways:
> >>>>> 1) It requires that the "PLZA" group has a dedicated CLOSID. This
> >>>>> reduces the
> >>>>> number of use cases that can be supported. Consider, for
> >>>>> example, an existing
> >>>>> "high priority" resource group and a "low priority" resource
> >>>>> group. The user may
> >>>>> just want to let the tasks in the "low priority" resource
> >>>>> group run as "high priority"
> >>>>> when in CPL0. This of course may depend on what resources are
> >>>>> allocated, for example
> >>>>> cache may need more care, but if, for example, user is only
> >>>>> interested in memory
> >>>>> bandwidth allocation this seems a reasonable use case?
> >>>>> 2) Similar to what Tony [1] mentioned this does not enable what the
> >>>>> hardware is
> >>>>> capable of in terms of number of different control groups/
> >>>>> CLOSID that can be
> >>>>> assigned to MSR_IA32_PQR_PLZA_ASSOC. Why limit PLZA to one
> >>>>> CLOSID?
> >>>>> 3) The feature seems to support RMID in MSR_IA32_PQR_PLZA_ASSOC
> >>>>> similar to
> >>>>> MSR_IA32_PQR_ASSOC. With this, it should be possible for user
> >>>>> space to, for
> >>>>> example, create a resource group that contains tasks of
> >>>>> interest and create
> >>>>> a monitor group within it that monitors all tasks' bandwidth
> >>>>> usage when in CPL0.
> >>>>> This will give user space better insight into system behavior
> >>>>> and from what I can
> >>>>> tell is supported by the feature but not enabled?
> >>>>
> >>>>
> >>>> Yes, as long as PLZA is enabled on only one group in the entire system
> >>>>
> >>>>>
> >>>>>>>
> >>>>>>>> 2) It can't be the root/default group
> >>>>>>>
> >>>>>>> This is something I added to keep the default group in a un-
> >>>>>>> disturbed,
> >>>>>
> >>>>> Why was this needed?
> >>>>>
> >>>>
> >>>> With the new approach mentioned about we can enable in default group
> >>>> also.
> >>>>
> >>>>>>>
> >>>>>>>> 3) It can't have sub monitor groups
> >>>>>
> >>>>> Why not?
> >>>>
> >>>> Ditto. With the new approach mentioned about we can enable in
> >>>> default group also.
> >>>>
> >>>>>
> >>>>>>>> 4) It can't be pseudo-locked
> >>>>>>>
> >>>>>>> Yes.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> Would a potential use case involve putting *all* tasks into the
> >>>>>>>> PLZA group? That
> >>>>>>>> would avoid any additional context switch overhead as the PLZA
> >>>>>>>> MSR would never
> >>>>>>>> need to change.
> >>>>>>>
> >>>>>>> Yes. That can be one use case.
> >>>>>>>
> >>>>>>>>
> >>>>>>>> If that is the case, maybe for the PLZA group we should allow
> >>>>>>>> user to
> >>>>>>>> do:
> >>>>>>>>
> >>>>>>>> # echo '*' > tasks
> >>>>>
> >>>>> Dedicating a resource group to "PLZA" seems restrictive while also
> >>>>> adding many
> >>>>> complications since this designation makes resource group behave
> >>>>> differently and
> >>>>> thus the files need to get extra "treatments" to handle this "PLZA"
> >>>>> designation.
> >>>>>
> >>>>> I am wondering if it will not be simpler to introduce just one new
> >>>>> file, for example
> >>>>> "tasks_cpl0" in both CTRL_MON and MON groups. When user space
> >>>>> writes a task ID to the
> >>>>> file it "enables" PLZA for this task and that group's CLOSID and
> >>>>> RMID is the associated
> >>>>> task's "PLZA" CLOSID and RMID. This gives user space the
> >>>>> flexibility to use the same
> >>>>> resource group to manage user space and kernel space allocations
> >>>>> while also supporting
> >>>>> various monitoring use cases. This still supports the "dedicate a
> >>>>> resource group to PLZA"
> >>>>> use case where user space can create a new resource group with
> >>>>> certain allocations but the
> >>>>> "tasks" file will be empty and "tasks_cpl0" contains the tasks
> >>>>> needing to run with
> >>>>> the resource group's allocations when in CPL0.
> >>>>
> >>>> Yes. We should be able do that. We need both tasks_cpl0 and cpus_cpl0.
> >>>>
> >>>> We need make sure only one group can configured in the system and
> >>>> not allow in other groups when it is already enabled.
> >>>
> >>> As I understand this means that only one group can have content in its
> >>> tasks_cpl0/tasks_kernel file. There should not be any special
> >>> handling for
> >>> the remaining files of the resource group since the resource group is
> >>> not
> >>> dedicated to kernel work and can be used as a user space resource
> >>> group also.
> >>> If user space wants to create a dedicated kernel resource group there
> >>> can be
> >>> a new resource group with an empty tasks file.
> >>>
> >>> hmmm ... but if user space writes a task ID to a tasks_cpl0/
> >>> tasks_kernel file then
> >>> resctrl would need to create new syntax to remove that task ID.
> >>>
> >>> Possibly MPAM can build on this by allowing user space to write to
> >>> multiple
> >>> tasks_cpl0/tasks_kernel files? (and the next version of PLZA may too)
> >>>
> >>> Reinette
> >>>
> >>>
> >>>>
> >>>> Thanks
> >>>> Babu
> >>>>
> >>>>>
> >>>>> Reinette
> >>>>>
> >>>>> [1] https://lore.kernel.org/lkml/aXpgragcLS2L8ROe@agluck-desk3/
> >>>>>
> >>>>
> >>>
> >>>
> >>
> >> Thanks,
> >>
> >> Ben
> >>
> >>
> >
>
> Thanks,
>
> Ben
>