Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept

From: Reinette Chatre

Date: Thu Jun 04 2026 - 17:05:22 EST


Hi Drew,

On 6/3/26 12:34 PM, Drew Fustini wrote:
> On Wed, Jun 03, 2026 at 04:15:51PM +0100, Ben Horgan wrote:
>> Hi Reinette,
>>
>> On 5/29/26 19:06, Reinette Chatre wrote:
>>> Hi Everybody,
>>>
>>> It has been a while since we discussed the resctrl changes required to support
>>> hardware that has controls with fine granularity or hardware that has multiple
>>> controls per resource. For reference, the most recent email discussion can
>>> be found at [1] with a summary of discussions in last year's plumbers slides [2].
>>>
>>> I created a PoC that I believe supports what folks have agreed to so far. I
>>> hope this can help us to restart the discussion with the goal that resctrl gains
>>> support for upcoming hardware that require these features.
>>
>> Thank you very much for doing this work. I believe this will be very useful for
>> MPAM and other architectures.
>
> Yes, thanks to Reinette for working on the generic schema proof of
> concept. This will be helpful for supporting the RISC-V CBQRI (capacity
> and bandwidth QoS) spec.

Thank you very much for considering this work.

>
>> I plumbed in support for the MB_MIN resource schema which also works under light
>> testing. The only fs resctrl code change I needed was:
>>
>> --- a/include/linux/resctrl.h
>> +++ b/include/linux/resctrl.h
>> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
>> resctrl_ctrl *ctrl)
>> case RESCTRL_CTRL_BITMAP:
>> return BIT_MASK(ctrl->cache.cbm_len) - 1;
>> case RESCTRL_CTRL_SCALAR:
>> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
>> + return ctrl->membw.min_bw;
>> +
>> return ctrl->membw.max_bw;
>> }
>>
>>
>> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
>> as the maximum bandwidth controls only take effect if their value is higher than
>> the minimum bandwidth value. I have specialised this on the ctrl->name which
>> breaks your ctrl->type based classification but that's fixable by just adding a
>> default field to membw.
>
> This should be useful for RISC-V.
>
> RESCTRL_CTRL_NAME_MIN maps well to CBQRI Rbwb (reserved bandwidth
> blocks). The sum of Rbwb across all control groups must be less than
> MRBWB (maximum number of reserved bandwidth blocks). As a result, MB_MIN
> needs to default to 1 so that the sum does not violate that rule. In my
> RFC series, I added default_to_min to resctrl_membw [1] but this
> solution looks cleaner.

As I mentioned in response to Ben [2] there seems to be a mismatch between
architecture requirements here. resctrl uses the value returned by
resctrl_get_default_ctrlval() as the control value that means "no throttling".
For Intel this means min == max but this does not seem to be the case for MPAM
and CBQRI. I am not familiar enough with either to have an alternative proposal here
so I need to become familiar now. There is a bit of backlog on other resctl
work right now so this will take me some time to sort out.

>
>>> - No support for "read-modify-write" usage of schemata file. This is where we
>>> discussed (without agreement) on possibly introducing the "#" prefix to schemata
>>> file entries. This PoC does not support this prefix and the current assumption/expectation
>>> is that when user space changes a configuration only the new control values are
>>> written to schemata file. I thus do not have a plan to support this so please
>>> share opinions in this regard if you have some.
>>
>> There is now less motivation from the MPAM side for this than when this was
>> initially discussed. In pre-upstream versions of the MPAM patches a change in
>> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
>> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
>>
>> However, it would be useful not to be limited by percentages. In my quick
>> experimentation with your patches I used a percentage value for MB_MIN but it
>> would be best to move away from this. For new controls I think we can mandate
>> that user space has to discover the resolution from the info directly but how
>> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
>> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
>> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
>> software can continue setting MB can move to using MB_MAX and take advantage of
>> the improved control. (I don't think we should expose the MPAM hardware value
>> directly as it has confusion over whether all 1s is 100% or not and we'd like to
>> have something generic and friendly to the user.)
>
> The facility for non-percentage value is import for RISC-V as CBQRI does
> not include percentage throttle. It has two controls for bandwidth:
>
> - Rbwb: number of reserved bandwidth blocks [1, 2^13]
> - Mweight: weighted share of the remaining bandwidth [0, 255]
> - 0: disables work-conserving sharing
> - 1..255: compete for the leftover pool
> - It makes for it to default to max (255) so that there won't be
> any unused bandwidth
>
> I think Mweight could be aligned with MPAM's proportional stride.
>
> Here is the patch I created to add Mweight support:
>
> diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
> index d95ab8ad36e2..3537071e3ab0 100644
> --- a/fs/resctrl/ctrlmondata.c
> +++ b/fs/resctrl/ctrlmondata.c
> @@ -304,6 +304,7 @@ static const char * const resctrl_ctrl_name[] = {
> [RESCTRL_CTRL_NAME_DEF] = "",
> [RESCTRL_CTRL_NAME_MIN] = "MIN",
> [RESCTRL_CTRL_NAME_MAX] = "MAX",
> + [RESCTRL_CTRL_NAME_WGHT] = "WGHT",
> };
>
> const char *resctrl_ctrl_name_str(enum resctrl_ctrl_name name)
> diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
> index 72fb7256270e..09efcef9ce66 100644
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -348,12 +348,14 @@ struct resctrl_mon {
> * has the same name as the resource.
> * @RESCTRL_CTRL_NAME_MIN: "MIN"
> * @RESCTRL_CTRL_NAME_MAX: "MAX"
> + * @RESCTRL_CTRL_NAME_WGHT: "WGHT"
> */
> enum resctrl_ctrl_name {
> RESCTRL_CTRL_NAME_DEF,
> RESCTRL_CTRL_NAME_MIN,
> RESCTRL_CTRL_NAME_MAX,
> - RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_MAX
> + RESCTRL_CTRL_NAME_WGHT,
> + RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_WGHT
> };
>
>>> - Controls are independent for now. This means that, for example, if a resource
>>> supports a "MIN" and "MAX" control then this implementation would allow user to
>>> set the "maximum" control values to be less than the "minimum" control values.
>>
>> I think this is ok as long as adding support for new controls in resctrl doesn't
>> change the existing behaviour. In MPAM we dodged this by introducing MB as only
>> affecting the h/w mbw_max and not mbw_min (as mentioned above).
>
> There is no equivalent to MB (percentage throttle) in RISC-V so I would
> want it to be valid to have MB_MIN (minimum reservation) without MB.
>
> I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
> was able to validate it works okay in Qemu:
>
> MB_WGHT:72=255
> MB_MIN:72=756
> L2:64=fff;65=fff
> L3:75=ffff

Ideally any new support should not break existing user space and the existing
user interface expects a MB entry in the schemata file when the MB resource exists.
Is it possible to emulate the percentage based MB control with MB_WGHT or MB_MIN?
This sounds similar as what is/was planned for MPAM [2].

Something that may be of interest is a proposal that Chenyu is refining to address an
issue with the region-aware MBA support where there is no intuitive backward compatible
interface. This was highlighted in the plumbers slides (see slide titled "Open: maintaining
backward compatibility when region aware"). The current idea to deal with this is to
introduce a "mode" associated with the resource controls. For example,

# cat /sys/fs/resctrl/info/MB/resource_schemata/mode
[legacy] native

By default the "legacy" mode will be enabled and exposes the "MB" default control to user
space via the schemata file. In support of this each new control has a new property file
named "status" that can have value "enabled" or "disabled". Only "enabled" controls are
present in the schemata file but all controls are always present in the resource_schemata
directory. By writing to the "mode" file user space acknowledges familiarity with the new
"resource_schemata" based interface and can change the status of a control and
thus manage its visibility in the schemata file.
Could something like this work for CBQRI?

Reinette



> [1] https://lore.kernel.org/all/20260601-ssqosid-cbqri-rqsc-v7-0-v6-6-baf00f50028a@xxxxxxxxxx/

[2] https://lore.kernel.org/lkml/c78169bc-e2d6-4583-96ec-09fa6dd6653a@xxxxxxxxx/