Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
From: Reinette Chatre
Date: Tue Jun 09 2026 - 12:23:51 EST
Hi Ben,
On 6/9/26 3:10 AM, Ben Horgan wrote:
> On 6/8/26 17:16, Reinette Chatre wrote:
>> On 6/5/26 9:37 AM, Ben Horgan wrote:
>>> On 6/5/26 16:39, Reinette Chatre wrote:
>>>> Hi Ben,
>>>>
>>>> On 6/5/26 7:53 AM, Ben Horgan wrote:
>>>>> On 6/4/26 18:43, Reinette Chatre wrote:
>>>>>> On 6/3/26 8:15 AM, Ben Horgan wrote:
>>>>>>> On 5/29/26 19:06, Reinette Chatre wrote:
>>>>
>>>> ...
>>>>
>>>>>>
>>>>>>> I plumbed in support for the MB_MIN resource schema which also works under light
>>>>>>> testing. The only fs resctrl code change I needed was:
>>>>>>>
>>>>>>> --- a/include/linux/resctrl.h
>>>>>>> +++ b/include/linux/resctrl.h
>>>>>>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>>>>>>> resctrl_ctrl *ctrl)
>>>>>>> case RESCTRL_CTRL_BITMAP:
>>>>>>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>>>>>>> case RESCTRL_CTRL_SCALAR:
>>>>>>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>>>>>>> + return ctrl->membw.min_bw;
>>>>>>> +
>>>>>>> return ctrl->membw.max_bw;
>>>>>>> }
>>>>>>>
>>>>>>>
>>>>>>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>>>>>>> as the maximum bandwidth controls only take effect if their value is higher than
>>>>>>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>>>>>>> breaks your ctrl->type based classification but that's fixable by just adding a
>>>>>>> default field to membw.
>>>>>>
>>>>>> This I am not sure about. In my understanding a typical "default" value means
>>>>>> "no throttling" and, at least on Intel, this default hardware state has been
>>>>>> summarized as "min" == "max" == "optimal".
>>>>>
>>>>> Ok, this sounds odd to me but that is probably because I don't know what Intel
>>>>> systems do. On MPAM systems a MIN control is a boost rather than a throttling
>>>>> control. Although, you can always think of that as throttling the traffic with
>>>>> the other PARTIDs.
>>>>>
>>>>>>
>>>>>> Are you saying that on MPAM systems if "min" == "max" then max bandwidth controls
>>>>>> do not take effect? Could you please elaborate what happens if "min" == "max"?
>>>>>
>>>>> Table 5-4 from section 5.2.8 of the IHI0099B.b shows the interaction between the
>>>>> min and maximum controls.
>>>>>
>>>>> If used bandwidth is The preference is Description
>>>>> Below the minimum High Only high requests compete with this
>>>>> request.
>>>>> Above the minimum:
>>>>> Below the maximum Medium High requests are serviced first then
>>>>>
>>>>> this request competes with other
>>>>> medium requests.
>>>>>
>>>>> Above the maximum, Low Requests are not serviced if any high
>>>>> when HARDLIM is 0 or medium requests are available.
>>>>>
>>>>> Above the maximum, None Requests are not serviced
>>>>> when HARDLIM is 1
>>>>>
>>>>> So if we keep the minimum and the maximum controls values always the same then
>>>>> all traffic will be given "high" preference until the target bandwidth is
>>>>> reached. For some MPAM systems it is recommended to set the minimum value as 5%
>>>>> less than the maximum value to get a reliable target bandwidth. As 5% seems
>>>>> implementation specific and some systems don't have min controls it seemed
>>>>> better to just match the MB control with a maximum bandwidth control and let the
>>>>> user have freedom to choose the minimum bandwidth control when MB_MIN support is
>>>>> added.
>>>>>
>>>>> If a default for the minimum of the maximum possible bandwidth is used (100%)
>>>>> then any change of the maximum won't have any effect as it's always less than
>>>>> minimum (if that's unchanged) and so all traffic is high preference. I now see
>>>>> from your reply below that you are planning on not allowing this kind of
>>>>> configuration.
>>>>>
>>>>> If the minimum always tracks the maximum then we lose the distinction between
>>>>> medium and high preference traffic and so to reserve some high preference
>>>>> bandwidth for one control group we'd have to change the configuration in the
>>>>> other controls groups so that they're bandwidth preference is medium (minimum
>>>>> value at 0).
>>>>
>>>> I do not think we are talking about the same thing here. I am *not* saying
>>>> that minimum and maximum controls should always be the same.
>>>>
>>>> The discussion is about a proposed change to resctrl_get_default_ctrlval(). resctrl
>>>> uses this function in two places:
>>>> - When creating a new resource group:
>>>> The intention here is that when user space creates a new resource group it should
>>>> be created with maximum allocations possible. For MBA this means "unthrottled".
>>>
>>> I would contend that for minimum controls that a policy of 'maximum allocation
>>> possible' isn't a useful default. I try and explain a bit more below.
>>>
>>>> After creating the resource group user space can adjust allocations to match
>>>> workload requirements.
>>>> - When unmounting the resctrl fs.
>>>> The intention here is that all controls are set to unthrottled to stop any possible
>>>> impact to system when user space stops using resctrl.
>>>>
>>>> resctrl_get_default_ctrlval() is thus intended to support an unthrottled baseline from
>>>> where user space can make configuration changes as supported by hardware and required
>>>> by workloads.
>>>
>>> The baseline that I see makes most sense for a minimum control is to have the
>>> default as 0. This just means that there is no "guaranteed"/high preference
>>> bandwidth reserved for the control group. I would say this still unthrottled but
>>> just not giving a boost. With this default the user can use MB (backed by max
>>> bandwidth) without having to know about MB_MIN (keeping it constant). If the
>>> default is 100% for min bandwidth then the user needs to know to set MB_MIN to
>>> be able to use MB. Having a default of 100% for max bandwidth, correspondingly
>>> means a user can change MB_MIN and see guaranteed bandwidth effects without
>>> having to know about MB/MB_MAX.
>>>
>>> Does this make sense?
>>
>> I see. I've only considered the original scenario where MPAM's MB is emulated with
>> both MIN and MAX controls based on examples in
>> https://lore.kernel.org/lkml/aPJP52jXJvRYAjjV@xxxxxxxxxxxxxxx/
>
> Ok, but I'm not sure why that's relevant. I feel we are talking at cross
> purposely here and so I'll try and answer your questions but I'm likely missing
> the point.
>
>>
>> There seems to be two issues here:
>> a) Since some systems require MIN to be 5% less than MAX MPAM driver may not know what
>> the system MIN and MAX difference should be to get optimal bandwidth.
>> b) Some MPAM systems have MAX control but not the MIN control.
>>
>> The proposal to only emulate MB with MAX is clear but it is not obvious why, on a
>> system that support both MIN and MAX, MB cannot be emulated with both as in the
>> original proposal.
>
> I don't see the advantage of emulating MB with both MIN and MAX. Just going by
> the MPAM specification, a system keeping MIN at 0 and just setting MAX from MB,
> (MIN=0, MAX=MB) should behave the same as one always setting both, (MIN=MB,
> MAX=MB). In the MIN=0 case there is never any high preference traffic and in the
> MIN=MAX_MB case there is never any medium preference traffic. It seemed best to
> not rely on any platform specific heuristics to try and guess what's better and
> just wait til the time we could support MB_MIN in resctrl (and leave the
> decision up to the user). My expectation was that this would be the simplest
> course of action.
This sounds fair. Two observations:
- The hierarchy exposed by resctrl may be different on systems that have the "same"
controls.
For example, on an MPAM system (if I understand correctly) the user may see:
info/
└── MB/
└── resource_schemata/
├── MB/
│ └── MB_MAX/
└── MB_MIN/
Compared with a possible implementation on Intel that looks like:
info/
└── MB/
└── resource_schemata/
├── MB/
│ └── MB_OPT/
├── MB_MAX/
└── MB_MIN/
On Intel these controls are optional so could even look more like MPAM:
info/
└── MB/
└── resource_schemata/
└── MB/
└── MB_MAX/
Dave Martin had some musings about how to present controls to user space with
some semblance of consistency but we were not able to finalize that.
At this time it seems that there may be agreement on needing hierarchies as
above but I am concerned about the complicated interface that results from
inconsistencies between systems. While complicated it does represent the relationships
accurately and thus could also be seen as working as intended ... so perhaps we
just need to take great care in documenting this hierarchy mechanism as the
source for control information as opposed to the schemata file.
- I see that the default of zero for the MIN control can be appropriate. For the
usage of resctrl_get_default_ctrlval() when creating a new control group this
could be ok as a general setting since the user is expected to, after creating
the resource group, adjust control values as required by the workloads it will
contain. I need to make sure about this from all architectures and if 0 is not
ok then we could use Drew's suggestion of architecture providing a specific reset
value.
For the other usage of resctrl_get_default_ctrlval() that resets controls on
unmount there is not an opportunity for user space to adjust and here more
care needs to be taken to match architecture requirements. The architecture
provided "reset" value does sound more appealing.
>
> Is this motivated by (a) where MPAM driver just does not know
>> what the "max" of a MIN control value should be?
>
> What's the "max" of a MIN control?
>
> The maximum value we can set it to?
The maximum value a user can set it to. I understood from an earlier comment that there
are some MPAM systems where the MIN needs to be set to an implementation specific 5% less
than MAX. I interpreted this to mean that it is difficult for the MPAM driver to always know
what the maximum value is that a user can write to the MIN control.
>
> We can work that. For MPAM it's writing all 1s to the register which for the
> minumum case represents ((2**mbw_min_wd)/2**mbw_min_wd)) * 100 %
>
> Just emulating MB with MAX control
>> does not seem to eliminate this problem since between the fs and arch resctrl still needs
>> to ensure that when user space writes a control value to the MIN control that it is valid
>> for the underlying MPAM system.
>>
>> It almost sounds as though there is an attempt to eliminate resctrl's usage of a "max"
>> value for the MIN control since that is effectively unknown to MPAM but that does
>> not look possible to me?
>
> Sorry but I haven't understood what your saying. What does "resctrl's usage of a
> "max" value for MIN control" mean?
Basically it is resctrl fs's validation of user input. Specifically, in bw_validate() where
the fs does this range check:
if (bw < r->membw.min_bw || bw > r->membw.max_bw)
resctrl fs thus uses the "max" value of a control for user input checking and it seemed to
me that it may be difficult for MPAM to lean that "max" from all systems but it sounds as
though the plan is instead to use the max that the architecture supports?
Reinette