Re: [RFC PATCH 13/19] x86/resctrl: Add PLZA state tracking and context switch handling

From: Reinette Chatre

Date: Tue Feb 17 2026 - 18:56:19 EST


Hi Tony,

On 2/17/26 2:52 PM, Luck, Tony wrote:
> On Tue, Feb 17, 2026 at 02:37:49PM -0800, Reinette Chatre wrote:
>> Hi Tony,
>>
>> On 2/17/26 1:44 PM, Luck, Tony wrote:
>>>>>>> I'm not sure if this would happen in the real world or not.
>>>>>>
>>>>>> Ack. I would like to echo Tony's request for feedback from resctrl users
>>>>>> https://lore.kernel.org/lkml/aYzcpuG0PfUaTdqt@agluck-desk3/
>>>>>
>>>>> Indeed. This is all getting a bit complicated.
>>>>>
>>>>
>>>> ack
>>>
>>> We have several proposals so far:
>>>
>>> 1) Ben's suggestion to use the default group (either with a Babu-style
>>> "plza" file just in that group, or a configuration file under "info/").
>>>
>>> This is easily the simplest for implementation, but has no flexibility.
>>> Also requires users to move all the non-critical workloads out to other
>>> CTRL_MON groups. Doesn't steal a CLOSID/RMID.
>>>
>>> 2) My thoughts are for a separate group that is only used to configure
>>> the schemata. This does allocate a dedicated CLOSID/RMID pair. Those
>>> are used for all tasks when in kernel mode.
>>>
>>> No context switch overhead. Has some flexibility.
>>>
>>> 3) Babu's RFC patch. Designates an existing CTRL_MON group as the one
>>> that defines kernel CLOSID/RMID. Tasks and CPUs can be assigned to this
>>> group in addition to belonging to another group than defines schemata
>>> resources when running in non-kernel mode.
>>> Tasks aren't required to be in the kernel group, in which case they
>>> keep the same CLOSID in both user and kernel mode. When used in this
>>> way there will be context switch overhead when changing between tasks
>>> with different kernel CLOSID/RMID.
>>>
>>> 4) Even more complex scenarios with more than one user configurable
>>> kernel group to give more options on resources available in the kernel.
>>>
>>>
>>> I had a quick pass as coding my option "2". My UI to designate the
>>> group to use for kernel mode is to reserve the name "kernel_group"
>>> when making CTRL_MON groups. Some tweaks to avoid creating the
>>> "tasks", "cpus", and "cpus_list" files (which might be done more
>>> elegantly), and "mon_groups" directory in this group.
>>
>> Should the decision of whether context switch overhead is acceptable
>> not be left up to the user?
>
> When someone comes up with a convincing use case to support one set of
> kernel resources when interrupting task A, and a different set of
> resources when interrupting task B, we should certainly listen.

Absolutely. Someone can come up with such use case at any time tough. This
could be, and as has happened with some other resctrl interfaces, likely will be
after this feature has been supported for a few kernel versions. What timeline
should we give which users to share their use cases with us? Even if we do hear
from some users will that guarantee that no such use case will arise in the
future? Such predictions of usage are difficult for me and I thus find it simpler
to think of flexible ways to enable the features that we know the hardware supports.

This does not mean that a full featured solution needs to be implemented from day 1.
If folks believe there are "no valid use cases" today resctrl still needs to prepare for
how it can grow to support full hardware capability and hardware designs in the
future.

Also, please also consider not just resources for kernel work but also monitoring for
kernel work. I do think, for example, a reasonable use case may be to determine
how much memory bandwidth the kernel uses on behalf of certain tasks.

>> I assume that, just like what is currently done for x86's MSR_IA32_PQR_ASSOC,
>> the needed registers will only be updated if there is a new CLOSID/RMID needed
>> for kernel space.
>
> Babu's RFC does this.

Right.

>
>> Are you suggesting that just this checking itself is too
>> expensive to justify giving user space more flexibility by fully enabling what
>> the hardware supports? If resctrl does draw such a line to not enable what
>> hardware supports it should be well justified.
>
> The check is likley light weight (as long as the variables to be
> compared reside in the same cache lines as the exisitng CLOSID
> and RMID checks). So if there is a use case for different resources
> when in kernel mode, then taking this path will be fine.

Why limit this to knowing about a use case? As I understand this feature can be
supported in a flexible way without introducing additional context switch overhead
if the user prefers to use just one allocation for all kernel work. By being
configurable and allowing resctrl to support more use cases in the future resctrl
does not paint itself into a corner. This allows resctrl to grow support so that
the user can use all capabilities of the hardware with understanding that it will
increase context switch time.

Reinette