Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
From: Reinette Chatre
Date: Wed Jun 10 2026 - 12:21:47 EST
Hi Chenyu,
On 6/10/26 7:27 AM, Chen, Yu C wrote:
> Hi Reinette,
>
> On 6/10/2026 3:09 PM, Chen, Yu C wrote:
>> Hi Reinette,
>>
>> On 6/10/2026 1:41 AM, Reinette Chatre wrote:
>>> Hi Ben,
>>>
>>> On 6/9/26 9:37 AM, Ben Horgan wrote:
>>>> On 6/9/26 16:28, Reinette Chatre wrote:
>>>>> On 6/9/26 3:10 AM, Ben Horgan wrote:
>>>>>> On 6/8/26 17:16, Reinette Chatre wrote:
>>>
>>>
>>>>>> I don't see the advantage of emulating MB with both MIN and MAX. Just going by
>>>>>> the MPAM specification, a system keeping MIN at 0 and just setting MAX from MB,
>>>>>> (MIN=0, MAX=MB) should behave the same as one always setting both, (MIN=MB,
>>>>>> MAX=MB). In the MIN=0 case there is never any high preference traffic and in the
>>>>>> MIN=MAX_MB case there is never any medium preference traffic. It seemed best to
>>>>>> not rely on any platform specific heuristics to try and guess what's better and
>>>>>> just wait til the time we could support MB_MIN in resctrl (and leave the
>>>>>> decision up to the user). My expectation was that this would be the simplest
>>>>>> course of action.
>>>>>
>>>>> This sounds fair. Two observations:
>>>>> - The hierarchy exposed by resctrl may be different on systems that have the "same"
>>>>> controls.
>>>>> For example, on an MPAM system (if I understand correctly) the user may see:
>>>>> info/
>>>>> └── MB/
>>>>> └── resource_schemata/
>>>>> ├── MB/
>>>>> │ └── MB_MAX/
>>>>> └── MB_MIN/
>>>>
>>>> Yes, this matches my understanding.
>>>>
>>>>>
>>>>> Compared with a possible implementation on Intel that looks like:
>>>>> info/
>>>>> └── MB/
>>>>> └── resource_schemata/
>>>>> ├── MB/
>>>>> │ └── MB_OPT/
>>>>> ├── MB_MAX/
>>>>> └── MB_MIN/
>>>>
>>>> Not sure if my understanding is correct here...
>>>> In the kernel today is it rdt max that backs MB? (Ignoring the sw controller)
>>>
>>> resctrl does not have support for the RDT "MAX" controller yet. Since resctrl was
>>> created as part of enabling RDT the resctrl MB control maps exactly to RDT's
>>> original percentage based memory delay value that is an approximate. Newer hardware
>>> support three controls: optimal, minimum, and maximum. These controls have finer
>>> granularity than what the default percentage based control supports so emulation
>>> is needed.
>>> So far I assumed that on these systems the default MB control would be emulated
>>> by the new "optimal" control but after these exchanges I can see there being an
>>> argument for it to be emulated by the new "maximum" control also. Apart from it
>>> implying a cap there is also the idea that the "maximum" control is more likely to
>>> be available on all platforms.
>>>
>>
>> Regarding the region-aware RDT case, I wonder if we actually need to emulate the
>> legacy MB control using MB_MAX. First, when we refer to the "legacy" for region-aware
>> RDT, I suppose it corresponds to "MSR access" plus "percentage-based control".
>>
>> case 1:
>> If the platform does not support region-aware RDT (no ERDT table is detected),
>> the MB is naturally the "legacy" MB, and the info directory would look like:
>>
>> info
>> └── MB
>> └── resource_schemata
>> └── MB
>>
>> case 2:If the platform supports region-aware RDT (i.e., ERDT parsing succeeds),
>> then the structure looks like below:
>>
>> info
>> └── MB
>> └── resource_schema
>> └── MB <=== legacy
>> └── MB_REGION0_OPT
>> └── MB_REGION1_OPT
>> └── MB_REGION0_MIN
>> └── MB_REGION1_MIX
>> └── MB_REGION0_MAX
>> └── MB_REGION1_MAX
>>
>
> This may be slightly off-topic from MAX emulation, but I have another
> thought regarding multi-controllers for rdt_resource:
> As we know, with N regions, an MB resource will have a total of N × 3
> controllers. Given that the current PoC iterates through every controller
> within the resource in resctrl_resource_ctrl_get(), could this increase
> lookup latency?
resctrl_resource_ctrl_get() is only called when the user writes to the
schemata file and since user can modify any of the enabled controls with
a write to the schemata file it is necessary to iterate through all controls
to find a match.
>
> I studied the cgroup code and found that each controller for a cgroup
> resource uses a dedicated cftype. For example:
> static struct cftype memory_files[] = {
> { .name = "min", .write = memory_min_write, .seq_show = memory_min_show },
> { .name = "max", .write = memory_max_write, .seq_show = memory_max_show },
> ...
> };
The PoC currently has "static struct rftype ctrl_files[] " that associates an
rftype with every property of a control which enables all user space interactions
with the individual controls to be direct.
>
> The min/max memory controllers can be accessed in O(1) time using:
> of_cft(of) -> kn->priv, and cft->write(of, buf, ...)
>
> rftype is resctrl's equivalent of cftype, and schemata is currently implemented
> as a single rftype. Would it make sense to define a separate rftype for each
> resctrl controller(or maybe in the future consider that this is not in a critical path)
Your suggestion is not clear to me. The schemata file is associated with a control group,
a resource group that has multiple allocations each backed by a different controller.
I do not think I fully understand your suggestion so would appreciate if you could
provide more detail.
Reinette