Re: [PATCH v11 00/23] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

From: Peter Newman
Date: Mon Feb 03 2025 - 10:02:02 EST


Hi Babu,

On Wed, Jan 22, 2025 at 9:20 PM Babu Moger <babu.moger@xxxxxxx> wrote:
>
>
> This series adds the support for Assignable Bandwidth Monitoring Counters
> (ABMC). It is also called QoS RMID Pinning feature
>
> Series is written such that it is easier to support other assignable
> features supported from different vendors.
>
> The feature details are documented in the APM listed below [1].
> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
> Monitoring (ABMC). The documentation is available at
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>
> The patches are based on top of commit
> d361b84d51bfe (tip/master) Merge branch into tip/master: 'x86/tdx'
>
> # Introduction
>
> Users can create as many monitor groups as RMIDs supported by the hardware.
> However, bandwidth monitoring feature on AMD system only guarantees that
> RMIDs currently assigned to a processor will be tracked by hardware.
> The counters of any other RMIDs which are no longer being tracked will be
> reset to zero. The MBM event counters return "Unavailable" for the RMIDs
> that are not tracked by hardware. So, there can be only limited number of
> groups that can give guaranteed monitoring numbers. With ever changing
> configurations there is no way to definitely know which of these groups
> are being tracked for certain point of time. Users do not have the option
> to monitor a group or set of groups for certain period of time without
> worrying about counter being reset in between.
>
> The ABMC feature provides an option to the user to assign a hardware
> counter to an RMID, event pair and monitor the bandwidth as long as it is
> assigned. The assigned RMID will be tracked by the hardware until the user
> unassigns it manually. There is no need to worry about counters being reset
> during this period. Additionally, the user can specify a bitmask identifying
> the specific bandwidth types from the given source to track with the counter.
>
> Without ABMC enabled, monitoring will work in current 'default' mode without
> assignment option.
>
> # Linux Implementation
>
> Create a generic interface aimed to support user space assignment
> of scarce counters used for monitoring. First usage of interface
> is by ABMC with option to expand usage to "soft-ABMC" and MPAM
> counters in future.

As a reminder of the work related to this, please take a look at the
thread where Reinette proposed a "shared counters" mode in
mbm_assign_control[1]. I am currently working to demonstrate that this
combined with the mbm_*_bytes_per_second events discussed earlier in
the same thread will address my users' concerns about the overhead of
reading a large number of MBM counters, resulting from a maximal
number of monitoring groups whose jobs are not isolated to any L3
monitoring domain.

ABMC will add to the number of registers which need to be programmed
in each domain, so I will need to demonstrate that ABMC combined with
these additional features addresses their performance concerns and
that the resulting interface is user-friendly enough that they will
not need a detailed understanding of the implementation to avoid an
unacceptable performance degradation (i.e., needing to understand what
conditions will increase the number of IPIs required).

If all goes well, soft-ABMC will try to extend this usage model to the
existing, pre-ABMC, AMD platforms I support.

Thanks,
-Peter

[1] https://lore.kernel.org/lkml/7ee63634-3b55-4427-8283-8e3d38105f41@xxxxxxxxx/