Re: [RFC] mpam,x86,fs/resctrl: Generic schema description Proof of Concept

From: Drew Fustini

Date: Wed Jun 03 2026 - 15:40:37 EST


On Wed, Jun 03, 2026 at 04:15:51PM +0100, Ben Horgan wrote:
> Hi Reinette,
>
> On 5/29/26 19:06, Reinette Chatre wrote:
> > Hi Everybody,
> >
> > It has been a while since we discussed the resctrl changes required to support
> > hardware that has controls with fine granularity or hardware that has multiple
> > controls per resource. For reference, the most recent email discussion can
> > be found at [1] with a summary of discussions in last year's plumbers slides [2].
> >
> > I created a PoC that I believe supports what folks have agreed to so far. I
> > hope this can help us to restart the discussion with the goal that resctrl gains
> > support for upcoming hardware that require these features.
>
> Thank you very much for doing this work. I believe this will be very useful for
> MPAM and other architectures.

Yes, thanks to Reinette for working on the generic schema proof of
concept. This will be helpful for supporting the RISC-V CBQRI (capacity
and bandwidth QoS) spec.

> I plumbed in support for the MB_MIN resource schema which also works under light
> testing. The only fs resctrl code change I needed was:
>
> --- a/include/linux/resctrl.h
> +++ b/include/linux/resctrl.h
> @@ -483,6 +483,9 @@ static inline u32 resctrl_get_default_ctrlval(struct
> resctrl_ctrl *ctrl)
> case RESCTRL_CTRL_BITMAP:
> return BIT_MASK(ctrl->cache.cbm_len) - 1;
> case RESCTRL_CTRL_SCALAR:
> + if (ctrl->name == RESCTRL_CTRL_NAME_MIN)
> + return ctrl->membw.min_bw;
> +
> return ctrl->membw.max_bw;
> }
>
>
> At least on MPAM systems, we use a default of 0 for minimum bandwidth controls
> as the maximum bandwidth controls only take effect if their value is higher than
> the minimum bandwidth value. I have specialised this on the ctrl->name which
> breaks your ctrl->type based classification but that's fixable by just adding a
> default field to membw.

This should be useful for RISC-V.

RESCTRL_CTRL_NAME_MIN maps well to CBQRI Rbwb (reserved bandwidth
blocks). The sum of Rbwb across all control groups must be less than
MRBWB (maximum number of reserved bandwidth blocks). As a result, MB_MIN
needs to default to 1 so that the sum does not violate that rule. In my
RFC series, I added default_to_min to resctrl_membw [1] but this
solution looks cleaner.

> > - No support for "read-modify-write" usage of schemata file. This is where we
> > discussed (without agreement) on possibly introducing the "#" prefix to schemata
> > file entries. This PoC does not support this prefix and the current assumption/expectation
> > is that when user space changes a configuration only the new control values are
> > written to schemata file. I thus do not have a plan to support this so please
> > share opinions in this regard if you have some.
>
> There is now less motivation from the MPAM side for this than when this was
> initially discussed. In pre-upstream versions of the MPAM patches a change in
> the MB resource control value would change both the mpam h/w mbw_min and mbw_max
> values but now (on non-broken h/w) we just change the mbw_max. (mbw_min kept at 0).
>
> However, it would be useful not to be limited by percentages. In my quick
> experimentation with your patches I used a percentage value for MB_MIN but it
> would be best to move away from this. For new controls I think we can mandate
> that user space has to discover the resolution from the info directly but how
> can we retrofit this. For MPAM, MB and MB_MAX, would control the same things.
> Could we just add MB_MAX with a h/w friendly scale and then reflect changes in
> MB_MAX in MB and vica versa with MB taking precedent if both are set? Old
> software can continue setting MB can move to using MB_MAX and take advantage of
> the improved control. (I don't think we should expose the MPAM hardware value
> directly as it has confusion over whether all 1s is 100% or not and we'd like to
> have something generic and friendly to the user.)

The facility for non-percentage value is import for RISC-V as CBQRI does
not include percentage throttle. It has two controls for bandwidth:

- Rbwb: number of reserved bandwidth blocks [1, 2^13]
- Mweight: weighted share of the remaining bandwidth [0, 255]
- 0: disables work-conserving sharing
- 1..255: compete for the leftover pool
- It makes for it to default to max (255) so that there won't be
any unused bandwidth

I think Mweight could be aligned with MPAM's proportional stride.

Here is the patch I created to add Mweight support:

diff --git a/fs/resctrl/ctrlmondata.c b/fs/resctrl/ctrlmondata.c
index d95ab8ad36e2..3537071e3ab0 100644
--- a/fs/resctrl/ctrlmondata.c
+++ b/fs/resctrl/ctrlmondata.c
@@ -304,6 +304,7 @@ static const char * const resctrl_ctrl_name[] = {
[RESCTRL_CTRL_NAME_DEF] = "",
[RESCTRL_CTRL_NAME_MIN] = "MIN",
[RESCTRL_CTRL_NAME_MAX] = "MAX",
+ [RESCTRL_CTRL_NAME_WGHT] = "WGHT",
};

const char *resctrl_ctrl_name_str(enum resctrl_ctrl_name name)
diff --git a/include/linux/resctrl.h b/include/linux/resctrl.h
index 72fb7256270e..09efcef9ce66 100644
--- a/include/linux/resctrl.h
+++ b/include/linux/resctrl.h
@@ -348,12 +348,14 @@ struct resctrl_mon {
* has the same name as the resource.
* @RESCTRL_CTRL_NAME_MIN: "MIN"
* @RESCTRL_CTRL_NAME_MAX: "MAX"
+ * @RESCTRL_CTRL_NAME_WGHT: "WGHT"
*/
enum resctrl_ctrl_name {
RESCTRL_CTRL_NAME_DEF,
RESCTRL_CTRL_NAME_MIN,
RESCTRL_CTRL_NAME_MAX,
- RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_MAX
+ RESCTRL_CTRL_NAME_WGHT,
+ RESCTRL_CTRL_NAME_LAST = RESCTRL_CTRL_NAME_WGHT
};

> > - Controls are independent for now. This means that, for example, if a resource
> > supports a "MIN" and "MAX" control then this implementation would allow user to
> > set the "maximum" control values to be less than the "minimum" control values.
>
> I think this is ok as long as adding support for new controls in resctrl doesn't
> change the existing behaviour. In MPAM we dodged this by introducing MB as only
> affecting the h/w mbw_max and not mbw_min (as mentioned above).

There is no equivalent to MB (percentage throttle) in RISC-V so I would
want it to be valid to have MB_MIN (minimum reservation) without MB.

I rebased my RISC-V CBQRI v6 series on top of this proof of concept and
was able to validate it works okay in Qemu:

MB_WGHT:72=255
MB_MIN:72=756
L2:64=fff;65=fff
L3:75=ffff

Thanks,
Drew

[1] https://lore.kernel.org/all/20260601-ssqosid-cbqri-rqsc-v7-0-v6-6-baf00f50028a@xxxxxxxxxx/