Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept
From: Reinette Chatre
Date: Mon Jun 08 2026 - 12:23:57 EST
Hi Ben,
On 6/5/26 9:37 AM, Ben Horgan wrote:
> Hi Reinette,
>
> On 6/5/26 16:39, Reinette Chatre wrote:
>> Hi Ben,
>>
>> On 6/5/26 7:53 AM, Ben Horgan wrote:
>>> On 6/4/26 18:43, Reinette Chatre wrote:
>>>> On 6/3/26 8:15 AM, Ben Horgan wrote:
>>>>> On 5/29/26 19:06, Reinette Chatre wrote:
>>
>> ...
>>
>>>>
>>>>> I plumbed in support for the MB_MIN resource schema which also works under light
>>>>> testing. The only fs resctrl code change I needed was:
>>>>>
>>>>> --- a/include/linux/resctrl.h
>>>>> +++ b/include/linux/resctrl.h
>>>>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>>>>> resctrl_ctrl *ctrl)
>>>>> case RESCTRL_CTRL_BITMAP:
>>>>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>>>>> case RESCTRL_CTRL_SCALAR:
>>>>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>>>>> + return ctrl->membw.min_bw;
>>>>> +
>>>>> return ctrl->membw.max_bw;
>>>>> }
>>>>>
>>>>>
>>>>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>>>>> as the maximum bandwidth controls only take effect if their value is higher than
>>>>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>>>>> breaks your ctrl->type based classification but that's fixable by just adding a
>>>>> default field to membw.
>>>>
>>>> This I am not sure about. In my understanding a typical "default" value means
>>>> "no throttling" and, at least on Intel, this default hardware state has been
>>>> summarized as "min" == "max" == "optimal".
>>>
>>> Ok, this sounds odd to me but that is probably because I don't know what Intel
>>> systems do. On MPAM systems a MIN control is a boost rather than a throttling
>>> control. Although, you can always think of that as throttling the traffic with
>>> the other PARTIDs.
>>>
>>>>
>>>> Are you saying that on MPAM systems if "min" == "max" then max bandwidth controls
>>>> do not take effect? Could you please elaborate what happens if "min" == "max"?
>>>
>>> Table 5-4 from section 5.2.8 of the IHI0099B.b shows the interaction between the
>>> min and maximum controls.
>>>
>>> If used bandwidth is The preference is Description
>>> Below the minimum High Only high requests compete with this
>>> request.
>>> Above the minimum:
>>> Below the maximum Medium High requests are serviced first then
>>>
>>> this request competes with other
>>> medium requests.
>>>
>>> Above the maximum, Low Requests are not serviced if any high
>>> when HARDLIM is 0 or medium requests are available.
>>>
>>> Above the maximum, None Requests are not serviced
>>> when HARDLIM is 1
>>>
>>> So if we keep the minimum and the maximum controls values always the same then
>>> all traffic will be given "high" preference until the target bandwidth is
>>> reached. For some MPAM systems it is recommended to set the minimum value as 5%
>>> less than the maximum value to get a reliable target bandwidth. As 5% seems
>>> implementation specific and some systems don't have min controls it seemed
>>> better to just match the MB control with a maximum bandwidth control and let the
>>> user have freedom to choose the minimum bandwidth control when MB_MIN support is
>>> added.
>>>
>>> If a default for the minimum of the maximum possible bandwidth is used (100%)
>>> then any change of the maximum won't have any effect as it's always less than
>>> minimum (if that's unchanged) and so all traffic is high preference. I now see
>>> from your reply below that you are planning on not allowing this kind of
>>> configuration.
>>>
>>> If the minimum always tracks the maximum then we lose the distinction between
>>> medium and high preference traffic and so to reserve some high preference
>>> bandwidth for one control group we'd have to change the configuration in the
>>> other controls groups so that they're bandwidth preference is medium (minimum
>>> value at 0).
>>
>> I do not think we are talking about the same thing here. I am *not* saying
>> that minimum and maximum controls should always be the same.
>>
>> The discussion is about a proposed change to resctrl_get_default_ctrlval(). resctrl
>> uses this function in two places:
>> - When creating a new resource group:
>> The intention here is that when user space creates a new resource group it should
>> be created with maximum allocations possible. For MBA this means "unthrottled".
>
> I would contend that for minimum controls that a policy of 'maximum allocation
> possible' isn't a useful default. I try and explain a bit more below.
>
>> After creating the resource group user space can adjust allocations to match
>> workload requirements.
>> - When unmounting the resctrl fs.
>> The intention here is that all controls are set to unthrottled to stop any possible
>> impact to system when user space stops using resctrl.
>>
>> resctrl_get_default_ctrlval() is thus intended to support an unthrottled baseline from
>> where user space can make configuration changes as supported by hardware and required
>> by workloads.
>
> The baseline that I see makes most sense for a minimum control is to have the
> default as 0. This just means that there is no "guaranteed"/high preference
> bandwidth reserved for the control group. I would say this still unthrottled but
> just not giving a boost. With this default the user can use MB (backed by max
> bandwidth) without having to know about MB_MIN (keeping it constant). If the
> default is 100% for min bandwidth then the user needs to know to set MB_MIN to
> be able to use MB. Having a default of 100% for max bandwidth, correspondingly
> means a user can change MB_MIN and see guaranteed bandwidth effects without
> having to know about MB/MB_MAX.
>
> Does this make sense?
I see. I've only considered the original scenario where MPAM's MB is emulated with
both MIN and MAX controls based on examples in
https://lore.kernel.org/lkml/aPJP52jXJvRYAjjV@xxxxxxxxxxxxxxx/
There seems to be two issues here:
a) Since some systems require MIN to be 5% less than MAX MPAM driver may not know what
the system MIN and MAX difference should be to get optimal bandwidth.
b) Some MPAM systems have MAX control but not the MIN control.
The proposal to only emulate MB with MAX is clear but it is not obvious why, on a
system that support both MIN and MAX, MB cannot be emulated with both as in the
original proposal. Is this motivated by (a) where MPAM driver just does not know
what the "max" of a MIN control value should be? Just emulating MB with MAX control
does not seem to eliminate this problem since between the fs and arch resctrl still needs
to ensure that when user space writes a control value to the MIN control that it is valid
for the underlying MPAM system.
It almost sounds as though there is an attempt to eliminate resctrl's usage of a "max"
value for the MIN control since that is effectively unknown to MPAM but that does
not look possible to me?
>> I see that the MPAM driver internally uses resctrl_get_default_ctrlval() in a couple
>> of places and I am not considering this usage here. If internally MPAM has other
>> usages for this function where it does not mean "unthrottled" then perhaps
>> it would be better to create a new function that matches the usage?
>
> I don't think the internal usage makes a difference here.
Thanks for checking.
>
> One process thing I was wondering about so that I know how to structure my
> patches. In the series you have a few patches which touch all architectures;
> these have the prefix mpam,x86,fs/resctrl. Is this how you would like cross
> architectures patches to look like or is it just for convenience in the rfc and
> a patch per-architecture is preferable?
A per-architecture patch is preferable. I did try to keep changes separate as much
as possible but some places all parts needed to be changed together to ensure that
bisect continue to work. I expect I may have missed some places in the rush to get
the PoC done and those need to be fixed.
Reinette