Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept

From: Chen, Yu C

Date: Wed Jun 10 2026 - 10:31:21 EST


Hi Reinette,

On 6/10/2026 3:09 PM, Chen, Yu C wrote:
Hi Reinette,

On 6/10/2026 1:41 AM, Reinette Chatre wrote:
Hi Ben,

On 6/9/26 9:37 AM, Ben Horgan wrote:
On 6/9/26 16:28, Reinette Chatre wrote:
On 6/9/26 3:10 AM, Ben Horgan wrote:
On 6/8/26 17:16, Reinette Chatre wrote:


I don't see the advantage of emulating MB with both MIN and MAX. Just going by
the MPAM specification, a system keeping MIN at 0 and just setting MAX from MB,
(MIN=0, MAX=MB) should behave the same as one always setting both, (MIN=MB,
MAX=MB). In the MIN=0 case there is never any high preference traffic and in the
MIN=MAX_MB case there is never any medium preference traffic. It seemed best to
not rely on any platform specific heuristics to try and guess what's better and
just wait til the time we could support MB_MIN in resctrl (and leave the
decision up to the user). My expectation was that this would be the simplest
course of action.

This sounds fair. Two observations:
- The hierarchy exposed by resctrl may be different on systems that have the "same"
   controls.
   For example, on an MPAM system (if I understand correctly) the user may see:
   info/
   └── MB/
       └── resource_schemata/
           ├── MB/
           │   └── MB_MAX/
           └── MB_MIN/

Yes, this matches my understanding.


   Compared with a possible implementation on Intel that looks like:
   info/
   └── MB/
       └── resource_schemata/
           ├── MB/
           │   └── MB_OPT/
           ├── MB_MAX/
           └── MB_MIN/

Not sure if my understanding is correct here...
In the kernel today is it rdt max that backs MB? (Ignoring the sw controller)

resctrl does not have support for the RDT "MAX" controller yet. Since resctrl was
created as part of enabling RDT the resctrl MB control maps exactly to RDT's
original percentage based memory delay value that is an approximate. Newer hardware
support three controls: optimal, minimum, and maximum. These controls have finer
granularity than what the default percentage based control supports so emulation
is needed.
So far I assumed that on these systems the default MB control would be emulated
by the new "optimal" control but after these exchanges I can see there being an
argument for it to be emulated by the new "maximum" control also. Apart from it
implying a cap there is also the idea that the "maximum" control is more likely to
be available on all platforms.


Regarding the region-aware RDT case, I wonder if we actually need to emulate the
legacy MB control using MB_MAX. First, when we refer to the "legacy" for region-aware
RDT, I suppose it corresponds to "MSR access" plus "percentage-based control".

case 1:
If the platform does not support region-aware RDT (no ERDT table is detected),
the MB is naturally the "legacy" MB, and the info directory would look like:

info
└── MB
        └── resource_schemata
                └── MB

case 2:If the platform supports region-aware RDT (i.e., ERDT parsing succeeds),
then the structure looks like below:

info
└── MB
        └── resource_schema
                └── MB                  <=== legacy
                └── MB_REGION0_OPT
                └── MB_REGION1_OPT
                └── MB_REGION0_MIN
                └── MB_REGION1_MIX
                └── MB_REGION0_MAX
                └── MB_REGION1_MAX


This may be slightly off-topic from MAX emulation, but I have another
thought regarding multi-controllers for rdt_resource:
As we know, with N regions, an MB resource will have a total of N × 3
controllers. Given that the current PoC iterates through every controller
within the resource in resctrl_resource_ctrl_get(), could this increase
lookup latency?

I studied the cgroup code and found that each controller for a cgroup
resource uses a dedicated cftype. For example:
static struct cftype memory_files[] = {
{ .name = "min", .write = memory_min_write, .seq_show = memory_min_show },
{ .name = "max", .write = memory_max_write, .seq_show = memory_max_show },
...
};

The min/max memory controllers can be accessed in O(1) time using:
of_cft(of) -> kn->priv, and cft->write(of, buf, ...)

rftype is resctrl's equivalent of cftype, and schemata is currently implemented
as a single rftype. Would it make sense to define a separate rftype for each
resctrl controller(or maybe in the future consider that this is not in a critical path)

thanks,
Chenyu