Re: [PATCH v1 1/4] x86,fs/resctrl: Make resctrl_arch_is_evt_configurable() aware of mbm_assign_mode

From: Ben Horgan

Date: Wed Mar 04 2026 - 16:02:01 EST


Hi Reinette,

On 3/4/26 19:23, Reinette Chatre wrote:
> Hi Ben,
>
> On 3/4/26 9:37 AM, Ben Horgan wrote:
>> On 3/4/26 17:02, Reinette Chatre wrote:
>>> On 3/4/26 3:07 AM, Ben Horgan wrote:
>>>> On 3/3/26 18:09, Reinette Chatre wrote:
>>>>> On 3/3/26 4:29 AM, Ben Horgan wrote:
>>>>>> On Mon, Mar 02, 2026 at 03:11:48PM -0800, Reinette Chatre wrote:
>>>>>>> Hi Ben,
>>>>>>>
>>>>>>> On 2/25/26 12:19 PM, Ben Horgan wrote:
>>>>>>>> The features BMEC and ABMC provide separate interfaces to configuring which
>>>>>>>> bandwidth types a counter tracks. Currently
>>>>>>>> resctrl_arch_is_evt_configurable() only ever returns true if BMEC is
>>>>>>>> supported.
>>>>>>>>
>>>>>>>> ABMC is useful even when BMEC is supported as it also provides counter
>>>>>>>> assignment which reduces the number of hardware monitors a system
>>>>>>>> requires. It is an architectural detail that ABMC provides counter
>>>>>>>
>>>>>>> Since the goal is to support MPAM I'd suggest that the first focus be on what
>>>>>>> resctrl fs supports and exposes and how it does or does not work for MPAM.
>>>>>>>
>>>>>>>> configurability without requiring the prior feature, BMEC. On MPAM systems
>>>>>>>> these two features are independent and the bandwidth types are limited to a
>>>>>>>> choice of only read or write.
>>>>>>>
>>>>>>> Does MPAM support exactly these two features? Specifically, does MPAM support
>>>>>>> a feature that allows user to configure events globally per domain and another
>>>>>>> feature that allows user to configure events per PMG?
>>>>>>
>>>>>> No, the bandwidth type configuration in MPAM is per counter and so effectively
>>>>>> per (PARTID, PMG) pair. In supporting hardware, the configuration is made in the
>>>>>> RWBW field of MSMON_CFG_MBWU_FLT and allows counting of just read, just write,
>>>>>> or both.
>>>>>
>>>>> Thank you for confirming.
>>>>>
>>>>> Since BMEC event configuration is per domain I do not believe BMEC is relevant to MPAM.
>>>>>
>>>>>
>>>>>>> These different features are how I understand assignable counters and BMEC to
>>>>>>
>>>>>> We are each approaching this from a different view point. I've just been looking at
>>>>>> ABMC as a way of dealing with systems where there are fewer hardware counters than
>>>>>> (PARTID, PMG) pairs (num_rmid) by requiring a counter to be assigned to a
>>>>>> CTRL_MON or MON group in order to be usable. resctrl otherwise expects a counter
>>>>>> per CTRL_MON/MON group. Sharing bandwidth counters doesn't work
>>>>>
>>>>> No, resctrl does not expect a counter per CTRL_MON/MON group - in assignable
>>>>> counter mode the counter assignment is per monitoring group AND event as a pair:
>>>>> (CTRL_MON/MON group, event).
>>>>
>>>> Yes but these counters aren't necessarily fungible. For MPAM the
>>>> mbm_local_bytes and mbm_total_bytes are necessarily backed by different
>>>> hardware counters. A MPAM bandwidth counters just counts all traffic on
>>>> a link with the only configurability being for read/write. The counters
>>>> are just placed at different point in the topology to get the different
>>>> events.
>>>
>>> The distinction between "different hardware counters for mbm_local_bytes and
>>> mbm_total_bytes" and "The counters are just placed at different point in the
>>> topology" is not clear to me". The former implies different counters for the
>>> two events while the latter implies the same counters are used for both events
>>> but perhaps accumulated/displayed differently?
>>
>> For a given RIS, mpam device hardware unit of which an MSC may consist
>> of 1 or more, there are MPAMF_MBWUMON_IDR.NUM_MON hardware bandwidth
>> counters which measure traffic passing a specific point with no
>> filtering for where it's going. The filtering of this counter is
>> set up in MSMON_CFG_MBWU_FLT which only allows pmg/partid/(read/write).
>
> Thank you for the details. Is the expectation that user should be able to
> program all these counters via resctrl? If an MSC consists of multiple RIS
> with different counters then things get complicated very fast. Could it be
> constrained to only expose the maximum number of counters supported by
> all RIS at a particular scope? This would match what the existing
> num_mbm_cntrs file supports.

Not individually, no, they will generally just be one per cache slice or
memory controller and all be programmed together as a component.

>
>
>> Whether or not these count traffic that leaves the local numa node or
>> only traffic that's internal to the numa node is a h/w design time (or
>> perhaps f/w) decision and so the mbm_local_bytes/mbm_total_bytes
>> distinction is a property of the RIS/MSC.
>
> mbm_local_bytes and mbm_total_bytes are already established as counting
> external bandwidth. Specifically, mbm_local_bytes counting L3 external
> bandwidth satisfied by the local memory.
> Do you have insight into what these systems will actually end up being
> programmed with? It is difficult to reason with so many hypotheticals.
> I wonder if it may not be simpler to expose a new unique event for the
> internal numbers? Could initial work be constrained to fit into
> existing definitions and then build from there?

Yes, we can assume mpam just supports mbm_total_bytes for the moment.

>
>
>> By different counters I'm referring to different RIS and by "different
>> places in the topology" I'm referring to the design decision of where
>> you put those RIS.
>
> resctrl is very much focused on monitoring external memory bandwidth at L3 scope.
> Monitoring memory bandwidth at different scopes still needs to be supported.
> This sounds related to the work Fenghua is planning? RISC-V also seems
> to have requirements around monitoring at different scope. Also, for
> reference, https://lore.kernel.org/lkml/fb1e2686-237b-4536-acd6-15159abafcba@xxxxxxxxx/
>
> Could we start by seeing how MPAM parts that support monitoring of external bandwidth
> at L3 scope can be supported, evaluate what is missing, and work from there?

Yes.

>
>>> I re-read the thread starting with
>>> https://lore.kernel.org/lkml/CALPaoCh+mRLJEfhKBve3hRf+vHHoObjvWRt74OfpopgtR9g9FQ@xxxxxxxxxxxxxx/
>>> and it sounded to me as though MPAM would only expose the mbm_total_bytes event.
>>
>> That's the case initially but is only due to current hardware support
>> and what can be described by acpi at the moment.
>
> I am becoming lost here. Are we discussing adding features to resctrl to support
> changes to ACPI that are currently under discussion for hardware that may or
> may not be built on what those ACPI descriptions may look like? This all sounds
> too hypothetical to seriously consider changes to resctrl at this time.

Sorry yes.. I was just thinking about not constraining what is
architecturally possible but we don't need to go there.

>
>>> Ignoring for a moment that counters could be configured to count different
>>> transactions, so assuming all counters count the same transactions. Could you
>>> please clarify how MPAM determines the counts returned by the
>>> mbm_local_bytes and mbm_total_bytes respectively?
>>>
>>>>>> as they need a fixed (PARTID, PMG) configuration to avoid missing counts.
>>>>>
>>>>> It is not clear to me how sharing counters are at play here.
>>>>
>>>> I was just saying it wasn't possible for bandwidth counters. For
>>>> llc_occupancy, CSU in MPAM, you can share 'counters' as they can just
>>>> recount to get the current cache occupancy.
>>>
>>> ack.
>>>
>>>>>> The intent of this patch is to allow splitting these two features of ABMC,
>>>>>> bandwidth type configuration and hardware counter assignment in order to just
>>>>>
>>>>> Why keep BMEC which is by its name does event configuration? And then on top
>>>>> of that it is event configuration at a scope that MPAM does not support?
>>>>>
>>>>>> support the hardware counter assignment.
>>>>>>
>>>>>> I'm still not understanding the distinction you are making though.
>>>>>> The files are,
>>>>>> With ABMC:
>>>>>> info/L3_MON/event_configs/mbm_[local,total]_bytes/event_filter
>>>>>
>>>>> This is an event configuration that is global without any assignment. This
>>>>> interface communicates to user space which transactions are counted when
>>>>> this particular event is assigned to a CTRL_MON/MON group. This interface
>>>>> is intended to be extensible. The interface starts with the original mbm_local_bytes
>>>>> and mbm_total_bytes events in order to be backward compatible. The vision is that
>>>>> if the user prefers to count different transactions then they could create
>>>>> a new event with the transactions needing counting. For example,
>>>>>
>>>>> # mkdir /sys/fs/resctrl/info/L3_MON/event_configs/just_local_slow
>>>>> # echo local_reads_slow_memory > /sys/fs/resctrl/info/L3_MON/event_configs/just_local_slow/event_filter
>>>>>
>>>>> The events are just tracked and managed in software with the above interface,
>>>>> no hardware configuration is involved at this point in the above example*.
>>>>>
>>>>> The new "just_local_slow" can can then be assigned to a monitor group via
>>>>> mbm_L3_assignments that will at that time consume one hardware counter and
>>>>> program it with the event (which transactions to monitor) and monitor group
>>>>> details (PARTID, PMG).
>>>>>
>>>>> This is based on original suggestion by Peter in a way that we thus expect to
>>>>> work for customers. See [1].
>>>>>
>>>>>> and with BMEC they are:
>>>>>> info/L3_MON/mbm_[local,total]_bytes_config
>>>>
>>>> I see this makes the intent much clearer to me. Thanks for sharing this
>>>> plan. I think the general idea is good. To me this implies that for MPAM
>>>> to support event configuration we'd want ABMC enabled at the same time.
>>>> Which indeed makes sense as then you can then count read and write
>>>> separately for a given CTRL_MON/MON group without requiring twice the
>>>> number of hardware counters.
>>>>
>>>> However, I now spot an existing issue, bundling mbm_local_bytes and
>>>> mbm_total_bytes together for one pool of counters doesn't work for MPAM.
>>>> As noted above they require different sets of hardware counters. With
>>>> the current counter assignment mode interface the num_mbm_cntrs is
>>>> scoped to all mbm counters. In an MPAM system that supports both
>>>> mbm_local_bytes and mbm_total_bytes this could lead to
>>>> num_mbm_total_cntrs and a num_mbm_local_cntrs or something equivalent.
>>>
>>> Is this just needed because MPAM driver does not support counter configuration
>>> yet?
>>
>> No. As I've hopefully managed to explain a bit better above these
>> necessarily come from different pools of counters.
>
> It sounds like the "different pools" may be managed separately based on scope
> and if there are different "internal" vs "external" capabilities of these counters
> then indeed they need to be assigned based on the type of the event. Do you have more
> details about these systems? If the "internal" vs "external" distinction is
> tied to the scope then resctrl may have a clear path to support this.

Not really, I think we are quite far away from this no.

>
>>>>> This is essentially both an event configuration and assignment that is not
>>>>> compatible with assignable counters. With this interface the user
>>>>> both configures which transactions are counted by a particular event and
>>>>> programs all counters in a domain (across all resource groups) to use that
>>>>> particular configuration. Due to this incompatibility resctrl fs will not expose
>>>>> BMEC files when assignable counters are enabled.
>>>>>
>>>>>
>>>>>> In both cases they have allow configuration for two event types,
>>>>>> mbm_local_bytes, and mbm_total_bytes. What am I missing?
>>>>>
>>>>> The way I see it:
>>>>> BMEC: per domain across all resource groups event configuration and assignment that
>>>>> applies to all counters - intended to support the "default" mode where there
>>>>> is no counter assignment from user space.
>>>>> assignable counters: event configuration via event_filter with assignment done
>>>>> separately using per resource group mbm_L3_assignments file
>>>>
>>>> Make sense.
>>>>
>>>>>
>>>>>>
>>>>>>> be and to support both at the same time requires a user interface that is
>>>>>>> confusing since the user can concurrently configure events globally per-domain
>>>>>>> and per resource group.
>>>>>>
>>>>>> Sure.
>>>>>>
>>>>>>>
>>>>>>> Could you please elaborate how event configuration work on MPAM? If find this
>>>>>>> series quite cryptic. I think it will help if you could elaborate what MPAM
>>>>>>> capabilities are and how you expect resctrl fs to expose these features to
>>>>>>> an MPAM user and how said used is expected to interact with resctrl fs to use
>>>>>>> the features.
>>>>>>
>>>>>> Ok, firstly regarding hardware counter assignment, on MPAM systems with more
>>>>>> (PARTID, PMG) pairs than bandwidth hardware counters we'd like to expose the
>>>>>> mbm_L3_assignments for tracking which CTRL_MON/MON groups have bandwidth
>>>>>> counting events and otherwise not.
>>>>>
>>>>> ok. This sounds like assignable counters to me. I do not believe BMEC comes
>>>>> into play.
>>>>>
>>>>>>
>>>>>> I haven't put much thought into how we would support event configuration with
>>>>>> MPAM but we would want something that allows the configuration per hardware
>>>>>> counter or (PARTID, PMG) pair. I'd rather not commit to the existing interface
>>>>>
>>>>> This is what assignable counters already does, no?
>>>>
>>>> Isn't that only with the future plan you shared above?
>>>
>>> Assigning a counter to a (PARTID, PMG) pair is what assignable counters does
>>> today.
>>
>> Yes, but isn't it the case that currently, once you've chosen the
>> configuration for mbm_local_bytes and for mbm_total_bytes, each hardware
>> event is tied to one of those two configurations? The future work will
>> allow the user to construct custom named events to give more general
>> event configuration where there can be more than 2 different
>> configurations at once. (Where I'm using configuration to mean selecting
>> which of the resctrl/bmec/abmc list of bandwidth types are used.)
>
> Right.
>
> Reinette
>

So, to try and bring this back to what we can be done now for MPAM to
fit into the counter mode assignment interface. Just support
mbm_total_bytes and then num_mbm_cntrs is correct (nothing to do). Make
the event_filter file always display all the bandwidth types and make
that the only value that be the only value it accepts (instead of hiding
the event_filter file). If you agree I'll respin with that.

Thanks,

Ben