Re: [PATCH v1 1/4] x86,fs/resctrl: Make resctrl_arch_is_evt_configurable() aware of mbm_assign_mode

From: Reinette Chatre

Date: Wed Mar 04 2026 - 14:24:03 EST


Hi Ben,

On 3/4/26 9:37 AM, Ben Horgan wrote:
> On 3/4/26 17:02, Reinette Chatre wrote:
>> On 3/4/26 3:07 AM, Ben Horgan wrote:
>>> On 3/3/26 18:09, Reinette Chatre wrote:
>>>> On 3/3/26 4:29 AM, Ben Horgan wrote:
>>>>> On Mon, Mar 02, 2026 at 03:11:48PM -0800, Reinette Chatre wrote:
>>>>>> Hi Ben,
>>>>>>
>>>>>> On 2/25/26 12:19 PM, Ben Horgan wrote:
>>>>>>> The features BMEC and ABMC provide separate interfaces to configuring which
>>>>>>> bandwidth types a counter tracks. Currently
>>>>>>> resctrl_arch_is_evt_configurable() only ever returns true if BMEC is
>>>>>>> supported.
>>>>>>>
>>>>>>> ABMC is useful even when BMEC is supported as it also provides counter
>>>>>>> assignment which reduces the number of hardware monitors a system
>>>>>>> requires. It is an architectural detail that ABMC provides counter
>>>>>>
>>>>>> Since the goal is to support MPAM I'd suggest that the first focus be on what
>>>>>> resctrl fs supports and exposes and how it does or does not work for MPAM.
>>>>>>
>>>>>>> configurability without requiring the prior feature, BMEC. On MPAM systems
>>>>>>> these two features are independent and the bandwidth types are limited to a
>>>>>>> choice of only read or write.
>>>>>>
>>>>>> Does MPAM support exactly these two features? Specifically, does MPAM support
>>>>>> a feature that allows user to configure events globally per domain and another
>>>>>> feature that allows user to configure events per PMG?
>>>>>
>>>>> No, the bandwidth type configuration in MPAM is per counter and so effectively
>>>>> per (PARTID, PMG) pair. In supporting hardware, the configuration is made in the
>>>>> RWBW field of MSMON_CFG_MBWU_FLT and allows counting of just read, just write,
>>>>> or both.
>>>>
>>>> Thank you for confirming.
>>>>
>>>> Since BMEC event configuration is per domain I do not believe BMEC is relevant to MPAM.
>>>>
>>>>
>>>>>> These different features are how I understand assignable counters and BMEC to
>>>>>
>>>>> We are each approaching this from a different view point. I've just been looking at
>>>>> ABMC as a way of dealing with systems where there are fewer hardware counters than
>>>>> (PARTID, PMG) pairs (num_rmid) by requiring a counter to be assigned to a
>>>>> CTRL_MON or MON group in order to be usable. resctrl otherwise expects a counter
>>>>> per CTRL_MON/MON group. Sharing bandwidth counters doesn't work
>>>>
>>>> No, resctrl does not expect a counter per CTRL_MON/MON group - in assignable
>>>> counter mode the counter assignment is per monitoring group AND event as a pair:
>>>> (CTRL_MON/MON group, event).
>>>
>>> Yes but these counters aren't necessarily fungible. For MPAM the
>>> mbm_local_bytes and mbm_total_bytes are necessarily backed by different
>>> hardware counters. A MPAM bandwidth counters just counts all traffic on
>>> a link with the only configurability being for read/write. The counters
>>> are just placed at different point in the topology to get the different
>>> events.
>>
>> The distinction between "different hardware counters for mbm_local_bytes and
>> mbm_total_bytes" and "The counters are just placed at different point in the
>> topology" is not clear to me". The former implies different counters for the
>> two events while the latter implies the same counters are used for both events
>> but perhaps accumulated/displayed differently?
>
> For a given RIS, mpam device hardware unit of which an MSC may consist
> of 1 or more, there are MPAMF_MBWUMON_IDR.NUM_MON hardware bandwidth
> counters which measure traffic passing a specific point with no
> filtering for where it's going. The filtering of this counter is
> set up in MSMON_CFG_MBWU_FLT which only allows pmg/partid/(read/write).

Thank you for the details. Is the expectation that user should be able to
program all these counters via resctrl? If an MSC consists of multiple RIS
with different counters then things get complicated very fast. Could it be
constrained to only expose the maximum number of counters supported by
all RIS at a particular scope? This would match what the existing
num_mbm_cntrs file supports.


> Whether or not these count traffic that leaves the local numa node or
> only traffic that's internal to the numa node is a h/w design time (or
> perhaps f/w) decision and so the mbm_local_bytes/mbm_total_bytes
> distinction is a property of the RIS/MSC.

mbm_local_bytes and mbm_total_bytes are already established as counting
external bandwidth. Specifically, mbm_local_bytes counting L3 external
bandwidth satisfied by the local memory.
Do you have insight into what these systems will actually end up being
programmed with? It is difficult to reason with so many hypotheticals.
I wonder if it may not be simpler to expose a new unique event for the
internal numbers? Could initial work be constrained to fit into
existing definitions and then build from there?


> By different counters I'm referring to different RIS and by "different
> places in the topology" I'm referring to the design decision of where
> you put those RIS.

resctrl is very much focused on monitoring external memory bandwidth at L3 scope.
Monitoring memory bandwidth at different scopes still needs to be supported.
This sounds related to the work Fenghua is planning? RISC-V also seems
to have requirements around monitoring at different scope. Also, for
reference, https://lore.kernel.org/lkml/fb1e2686-237b-4536-acd6-15159abafcba@xxxxxxxxx/

Could we start by seeing how MPAM parts that support monitoring of external bandwidth
at L3 scope can be supported, evaluate what is missing, and work from there?

>> I re-read the thread starting with
>> https://lore.kernel.org/lkml/CALPaoCh+mRLJEfhKBve3hRf+vHHoObjvWRt74OfpopgtR9g9FQ@xxxxxxxxxxxxxx/
>> and it sounded to me as though MPAM would only expose the mbm_total_bytes event.
>
> That's the case initially but is only due to current hardware support
> and what can be described by acpi at the moment.

I am becoming lost here. Are we discussing adding features to resctrl to support
changes to ACPI that are currently under discussion for hardware that may or
may not be built on what those ACPI descriptions may look like? This all sounds
too hypothetical to seriously consider changes to resctrl at this time.

>> Ignoring for a moment that counters could be configured to count different
>> transactions, so assuming all counters count the same transactions. Could you
>> please clarify how MPAM determines the counts returned by the
>> mbm_local_bytes and mbm_total_bytes respectively?
>>
>>>>> as they need a fixed (PARTID, PMG) configuration to avoid missing counts.
>>>>
>>>> It is not clear to me how sharing counters are at play here.
>>>
>>> I was just saying it wasn't possible for bandwidth counters. For
>>> llc_occupancy, CSU in MPAM, you can share 'counters' as they can just
>>> recount to get the current cache occupancy.
>>
>> ack.
>>
>>>>> The intent of this patch is to allow splitting these two features of ABMC,
>>>>> bandwidth type configuration and hardware counter assignment in order to just
>>>>
>>>> Why keep BMEC which is by its name does event configuration? And then on top
>>>> of that it is event configuration at a scope that MPAM does not support?
>>>>
>>>>> support the hardware counter assignment.
>>>>>
>>>>> I'm still not understanding the distinction you are making though.
>>>>> The files are,
>>>>> With ABMC:
>>>>> info/L3_MON/event_configs/mbm_[local,total]_bytes/event_filter
>>>>
>>>> This is an event configuration that is global without any assignment. This
>>>> interface communicates to user space which transactions are counted when
>>>> this particular event is assigned to a CTRL_MON/MON group. This interface
>>>> is intended to be extensible. The interface starts with the original mbm_local_bytes
>>>> and mbm_total_bytes events in order to be backward compatible. The vision is that
>>>> if the user prefers to count different transactions then they could create
>>>> a new event with the transactions needing counting. For example,
>>>>
>>>> # mkdir /sys/fs/resctrl/info/L3_MON/event_configs/just_local_slow
>>>> # echo local_reads_slow_memory > /sys/fs/resctrl/info/L3_MON/event_configs/just_local_slow/event_filter
>>>>
>>>> The events are just tracked and managed in software with the above interface,
>>>> no hardware configuration is involved at this point in the above example*.
>>>>
>>>> The new "just_local_slow" can can then be assigned to a monitor group via
>>>> mbm_L3_assignments that will at that time consume one hardware counter and
>>>> program it with the event (which transactions to monitor) and monitor group
>>>> details (PARTID, PMG).
>>>>
>>>> This is based on original suggestion by Peter in a way that we thus expect to
>>>> work for customers. See [1].
>>>>
>>>>> and with BMEC they are:
>>>>> info/L3_MON/mbm_[local,total]_bytes_config
>>>
>>> I see this makes the intent much clearer to me. Thanks for sharing this
>>> plan. I think the general idea is good. To me this implies that for MPAM
>>> to support event configuration we'd want ABMC enabled at the same time.
>>> Which indeed makes sense as then you can then count read and write
>>> separately for a given CTRL_MON/MON group without requiring twice the
>>> number of hardware counters.
>>>
>>> However, I now spot an existing issue, bundling mbm_local_bytes and
>>> mbm_total_bytes together for one pool of counters doesn't work for MPAM.
>>> As noted above they require different sets of hardware counters. With
>>> the current counter assignment mode interface the num_mbm_cntrs is
>>> scoped to all mbm counters. In an MPAM system that supports both
>>> mbm_local_bytes and mbm_total_bytes this could lead to
>>> num_mbm_total_cntrs and a num_mbm_local_cntrs or something equivalent.
>>
>> Is this just needed because MPAM driver does not support counter configuration
>> yet?
>
> No. As I've hopefully managed to explain a bit better above these
> necessarily come from different pools of counters.

It sounds like the "different pools" may be managed separately based on scope
and if there are different "internal" vs "external" capabilities of these counters
then indeed they need to be assigned based on the type of the event. Do you have more
details about these systems? If the "internal" vs "external" distinction is
tied to the scope then resctrl may have a clear path to support this.

>>>> This is essentially both an event configuration and assignment that is not
>>>> compatible with assignable counters. With this interface the user
>>>> both configures which transactions are counted by a particular event and
>>>> programs all counters in a domain (across all resource groups) to use that
>>>> particular configuration. Due to this incompatibility resctrl fs will not expose
>>>> BMEC files when assignable counters are enabled.
>>>>
>>>>
>>>>> In both cases they have allow configuration for two event types,
>>>>> mbm_local_bytes, and mbm_total_bytes. What am I missing?
>>>>
>>>> The way I see it:
>>>> BMEC: per domain across all resource groups event configuration and assignment that
>>>> applies to all counters - intended to support the "default" mode where there
>>>> is no counter assignment from user space.
>>>> assignable counters: event configuration via event_filter with assignment done
>>>> separately using per resource group mbm_L3_assignments file
>>>
>>> Make sense.
>>>
>>>>
>>>>>
>>>>>> be and to support both at the same time requires a user interface that is
>>>>>> confusing since the user can concurrently configure events globally per-domain
>>>>>> and per resource group.
>>>>>
>>>>> Sure.
>>>>>
>>>>>>
>>>>>> Could you please elaborate how event configuration work on MPAM? If find this
>>>>>> series quite cryptic. I think it will help if you could elaborate what MPAM
>>>>>> capabilities are and how you expect resctrl fs to expose these features to
>>>>>> an MPAM user and how said used is expected to interact with resctrl fs to use
>>>>>> the features.
>>>>>
>>>>> Ok, firstly regarding hardware counter assignment, on MPAM systems with more
>>>>> (PARTID, PMG) pairs than bandwidth hardware counters we'd like to expose the
>>>>> mbm_L3_assignments for tracking which CTRL_MON/MON groups have bandwidth
>>>>> counting events and otherwise not.
>>>>
>>>> ok. This sounds like assignable counters to me. I do not believe BMEC comes
>>>> into play.
>>>>
>>>>>
>>>>> I haven't put much thought into how we would support event configuration with
>>>>> MPAM but we would want something that allows the configuration per hardware
>>>>> counter or (PARTID, PMG) pair. I'd rather not commit to the existing interface
>>>>
>>>> This is what assignable counters already does, no?
>>>
>>> Isn't that only with the future plan you shared above?
>>
>> Assigning a counter to a (PARTID, PMG) pair is what assignable counters does
>> today.
>
> Yes, but isn't it the case that currently, once you've chosen the
> configuration for mbm_local_bytes and for mbm_total_bytes, each hardware
> event is tied to one of those two configurations? The future work will
> allow the user to construct custom named events to give more general
> event configuration where there can be more than 2 different
> configurations at once. (Where I'm using configuration to mean selecting
> which of the resctrl/bmec/abmc list of bandwidth types are used.)

Right.

Reinette