Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept

From: Reinette Chatre

Date: Tue Jun 09 2026 - 13:53:00 EST


Hi Ben,

On 6/9/26 9:37 AM, Ben Horgan wrote:
> On 6/9/26 16:28, Reinette Chatre wrote:
>> On 6/9/26 3:10 AM, Ben Horgan wrote:
>>> On 6/8/26 17:16, Reinette Chatre wrote:


>>> I don't see the advantage of emulating MB with both MIN and MAX. Just going by
>>> the MPAM specification, a system keeping MIN at 0 and just setting MAX from MB,
>>> (MIN=0, MAX=MB) should behave the same as one always setting both, (MIN=MB,
>>> MAX=MB). In the MIN=0 case there is never any high preference traffic and in the
>>> MIN=MAX_MB case there is never any medium preference traffic. It seemed best to
>>> not rely on any platform specific heuristics to try and guess what's better and
>>> just wait til the time we could support MB_MIN in resctrl (and leave the
>>> decision up to the user). My expectation was that this would be the simplest
>>> course of action.
>>
>> This sounds fair. Two observations:
>> - The hierarchy exposed by resctrl may be different on systems that have the "same"
>> controls.
>> For example, on an MPAM system (if I understand correctly) the user may see:
>> info/
>> └── MB/
>> └── resource_schemata/
>> ├── MB/
>> │   └── MB_MAX/
>> └── MB_MIN/
>
> Yes, this matches my understanding.
>
>>
>> Compared with a possible implementation on Intel that looks like:
>> info/
>> └── MB/
>> └── resource_schemata/
>> ├── MB/
>> │   └── MB_OPT/
>> ├── MB_MAX/
>> └── MB_MIN/
>
> Not sure if my understanding is correct here...
> In the kernel today is it rdt max that backs MB? (Ignoring the sw controller)

resctrl does not have support for the RDT "MAX" controller yet. Since resctrl was
created as part of enabling RDT the resctrl MB control maps exactly to RDT's
original percentage based memory delay value that is an approximate. Newer hardware
support three controls: optimal, minimum, and maximum. These controls have finer
granularity than what the default percentage based control supports so emulation
is needed.
So far I assumed that on these systems the default MB control would be emulated
by the new "optimal" control but after these exchanges I can see there being an
argument for it to be emulated by the new "maximum" control also. Apart from it
implying a cap there is also the idea that the "maximum" control is more likely to
be available on all platforms.


> If so wouldn't the meaning of MB change within the same platform on a kernel
> upgrade once the rdt optimal support is added?

Good point. Thank you for bringing this up. While the existing RDT MB control is
approximate the RDT spec does contain the statement "... should be viewed as a
maximum bandwidth “cap” per-CLOS." which is quite clear. I'll follow up with
RDT folks on this.

...

>>> We can work that. For MPAM it's writing all 1s to the register which for the
>>> minumum case represents ((2**mbw_min_wd)/2**mbw_min_wd)) * 100 %
>>>
>>> Just emulating MB with MAX control
>>>> does not seem to eliminate this problem since between the fs and arch resctrl still needs
>>>> to ensure that when user space writes a control value to the MIN control that it is valid
>>>> for the underlying MPAM system.
>>>>
>>>> It almost sounds as though there is an attempt to eliminate resctrl's usage of a "max"
>>>> value for the MIN control since that is effectively unknown to MPAM but that does
>>>> not look possible to me?
>>>
>>> Sorry but I haven't understood what your saying. What does "resctrl's usage of a
>>> "max" value for MIN control" mean?
>>
>> Basically it is resctrl fs's validation of user input. Specifically, in bw_validate() where
>> the fs does this range check:
>> if (bw < r->membw.min_bw || bw > r->membw.max_bw)
>>
>> resctrl fs thus uses the "max" value of a control for user input checking and it seemed to
>> me that it may be difficult for MPAM to lean that "max" from all systems but it sounds as
>> though the plan is instead to use the max that the architecture supports?
>
> Yes, can just size the max based on the number of configuration bits in the h/w.

Ack. Thank you for the clarification.

Reinette