Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling
From: Luck, Tony
Date: Wed Feb 11 2026 - 14:47:32 EST
On Wed, Feb 11, 2026 at 04:40:32PM +0000, Ben Horgan wrote:
> Hi,
>
> Thanks for including me.
>
> On Tue, Feb 10, 2026 at 10:04:48AM -0800, Reinette Chatre wrote:
> > +Ben and Drew
> >
> > On 2/10/26 8:17 AM, Reinette Chatre wrote:
> > > Hi Babu,
> > >
> > > On 1/28/26 9:44 AM, Moger, Babu wrote:
> > >>
> > >>
> > >> On 1/28/2026 11:41 AM, Moger, Babu wrote:
> > >>>> On Wed, Jan 28, 2026 at 10:01:39AM -0600, Moger, Babu wrote:
> > >>>>> On 1/27/2026 4:30 PM, Luck, Tony wrote:
> > >>>> Babu,
> > >>>>
> > >>>> I've read a bit more of the code now and I think I understand more.
> > >>>>
> > >>>> Some useful additions to your explanation.
> > >>>>
> > >>>> 1) Only one CTRL group can be marked as PLZA
> > >>>
> > >>> Yes. Correct.
> > >
> > > Why limit it to one CTRL_MON group and why not support it for MON groups?
> > >
> > > Limiting it to a single CTRL group seems restrictive in a few ways:
> > > 1) It requires that the "PLZA" group has a dedicated CLOSID. This reduces the
> > > number of use cases that can be supported. Consider, for example, an existing
> > > "high priority" resource group and a "low priority" resource group. The user may
> > > just want to let the tasks in the "low priority" resource group run as "high priority"
> > > when in CPL0. This of course may depend on what resources are allocated, for example
> > > cache may need more care, but if, for example, user is only interested in memory
> > > bandwidth allocation this seems a reasonable use case?
> > > 2) Similar to what Tony [1] mentioned this does not enable what the hardware is
> > > capable of in terms of number of different control groups/CLOSID that can be
> > > assigned to MSR_IA32_PQR_PLZA_ASSOC. Why limit PLZA to one CLOSID?
> > > 3) The feature seems to support RMID in MSR_IA32_PQR_PLZA_ASSOC similar to
> > > MSR_IA32_PQR_ASSOC. With this, it should be possible for user space to, for
> > > example, create a resource group that contains tasks of interest and create
> > > a monitor group within it that monitors all tasks' bandwidth usage when in CPL0.
> > > This will give user space better insight into system behavior and from what I can
> > > tell is supported by the feature but not enabled?
> > >
> > >>>
> > >>>> 2) It can't be the root/default group
> > >>>
> > >>> This is something I added to keep the default group in a un-disturbed,
> > >
> > > Why was this needed?
> > >
> > >>>
> > >>>> 3) It can't have sub monitor groups
> > >
> > > Why not?
> > >
> > >>>> 4) It can't be pseudo-locked
> > >>>
> > >>> Yes.
> > >>>
> > >>>>
> > >>>> Would a potential use case involve putting *all* tasks into the PLZA group? That
> > >>>> would avoid any additional context switch overhead as the PLZA MSR would never
> > >>>> need to change.
> > >>>
> > >>> Yes. That can be one use case.
> > >>>
> > >>>>
> > >>>> If that is the case, maybe for the PLZA group we should allow user to
> > >>>> do:
> > >>>>
> > >>>> # echo '*' > tasks
> > >
> > > Dedicating a resource group to "PLZA" seems restrictive while also adding many
> > > complications since this designation makes resource group behave differently and
> > > thus the files need to get extra "treatments" to handle this "PLZA" designation.
> > >
> > > I am wondering if it will not be simpler to introduce just one new file, for example
> > > "tasks_cpl0" in both CTRL_MON and MON groups. When user space writes a task ID to the
> > > file it "enables" PLZA for this task and that group's CLOSID and RMID is the associated
> > > task's "PLZA" CLOSID and RMID. This gives user space the flexibility to use the same
> > > resource group to manage user space and kernel space allocations while also supporting
> > > various monitoring use cases. This still supports the "dedicate a resource group to PLZA"
> > > use case where user space can create a new resource group with certain allocations but the
> > > "tasks" file will be empty and "tasks_cpl0" contains the tasks needing to run with
> > > the resource group's allocations when in CPL0.
>
> If there is a "tasks_cpl0" then I'd expect a "cpus_cpl0" too.
>
> >
> > It looks like MPAM has a few more capabilities here and the Arm levels are numbered differently
> > with EL0 meaning user space. We should thus aim to keep things as generic as possible. For example,
> > instead of CPL0 using something like "kernel" or ... ?
>
> Yes, PLZA does open up more possibilities for MPAM usage. I've talked to James
> internally and here are a few thoughts.
>
> If the user case is just that an option run all tasks with the same closid/rmid
> (partid/pmg) configuration when they are running in the kernel then I'd favour a
> mount option. The resctrl filesytem interface doesn't need to change and
> userspace software doesn't need to change. This could either take away a
> closid/rmid from userspace and dedicate it to the kernel or perhaps have a
> policy to have the default group as the kernel group. If you use the default
> configuration, at least for MPAM, the kernel may not be running at the highest
> priority as a minimum bandwidth can be used to give a priority boost. (Once we
> have a resctrl schema for this.)
I'm a big fan of this use case. It's easy to understand why users would
want this. It avoids the issue that syscalls, page-faults, and
interrupts from a task with very limited resources will spend ages in
the kernel. Users have complained about the priority inversions that
this can cause.
It also has a simpler implementation. No changes to the context switch
code. On x86 some simple method to steal a CLOSID and configure
resources for that CLOSID.
>
> It could be useful to have something a bit more featureful though. Is there a
Many things have theoretical use cases. I'd like to hear from some
resctrl users whether they will make use of these extra features.
Babu's RFC allows for some tasks to be in the PLZA group while others
will run in kernel mode with the same resources that are granted to
the CTRL group they belong too.
Reinette asked[1] whether the PLZA mode should be extended to multiple
CTRL groups and their child CTRL_MON groups for even greater
flexibility.
[1] https://lore.kernel.org/all/7a4ea07d-88e6-4f0f-a3ce-4fd97388cec4@xxxxxxxxx/
> need for the two mappings, task->cpl0 config and task->cpl1 to be independent or
> would as task->(cp0 config, cp1 config) be sufficient? It seems awkward that
> it's not a single write to move a task. If a single mapping is sufficient, then
> as single new file, kernel_group,per CTRL_MON group (maybe MON groups) as
> suggested above but rather than a task that file could hold a path to the
> CTRL_MON/MON group that provides the kernel configuraion for tasks running in
> that group. So that this can be transparent to existing software an empty string
> can mean use the current group's when in the kernel (as well as for
> userspace). A slash, /, could be used to refer to the default group. This would
> give something like the below under /sys/fs/resctrl.
>
> .
> ├── cpus
> ├── tasks
> ├── ctrl1
> │ ├── cpus
> │ ├── kernel_group -> mon_groups/mon1
> │ └── tasks
> ├── kernel_group -> ctrl1
> └── mon_groups
> └── mon1
> ├── cpus
> ├── kernel_group -> ctrl1
> └── tasks
>
> >
> > I have not read anything about the RISC-V side of this yet.
> >
> > Reinette
> >
> > >
> > > Reinette
> > >
> > > [1] https://lore.kernel.org/lkml/aXpgragcLS2L8ROe@agluck-desk3/
> >
>
> Thanks,
>
> Ben
-Tony