Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

From: Reinette Chatre
Date: Fri Feb 02 2024 - 00:04:40 EST


+James

On 2/1/2024 8:09 PM, Reinette Chatre wrote:
> Hi Babu,
>
> On 1/19/2024 10:22 AM, Babu Moger wrote:
>>
>> These series adds the support for Assignable Bandwidth Monitoring Counters
>
> Not a good start ([1]).
>
>> (ABMC). It is also called QoS RMID Pinning feature
>>
>> The feature details are documented in the APM listed below [1].
>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>> Monitoring (ABMC). The documentation is available at
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>
>> The patches are based on top of commit
>> 1ac6b49423e83af2abed9be7fbdf2e491686c66b (tip/master)
>>
>> # Introduction
>>
>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
>> feature only guarantees that RMIDs currently assigned to a processor will
>> be tracked by hardware. The counters of any other RMIDs which are no longer
>> being tracked will be reset to zero. The MBM event counters return
>> "Unavailable" for the RMIDs that are not active.
>>
>> Users can create 256 or more monitor groups. But there can be only limited
>> number of groups that can be give guaranteed monitoring numbers. With ever
>
> "can be given"?
>
>> changing configurations there is no way to definitely know which of these
>> groups will be active for certain point of time. Users do not have the
>> option to monitor a group or set of groups for certain period of time
>> without worrying about RMID being reset in between.
>>
>> The ABMC feature provides an option to the user to assign an RMID to the
>> hardware counter and monitor the bandwidth for a longer duration.
>> The assigned RMID will be active until the user unassigns it manually.
>> There is no need to worry about counters being reset during this period.
>> Additionally, the user can specify a bitmask identifying the specific
>> bandwidth types from the given source to track with the counter.
>>
>> Without ABMC enabled, monitoring will work in current mode without
>> assignment option.
>>
>> # Linux Implementation
>>
>> Linux resctrl subsystem provides the interface to count maximum of two
>> memory bandwidth events per group, from a combination of available total
>> and local events. Keeping the current interface, users can assign a maximum
>> of 2 ABMC counters per group. User will also have the option to assign only
>> one counter to the group. If the system runs out of assignable ABMC
>> counters, kernel will display an error. Users need to unassign an already
>> assigned counter to make space for new assignments.
>>
>>
>> # Examples
>>
>> a. Check if ABMC support is available
>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mon_features
>> llc_occupancy
>> mbm_total_bytes
>> mbm_total_bytes_config
>> mbm_local_bytes
>> mbm_local_bytes_config
>> mbm_assign_capable ← Linux kernel detected ABMC feature
>>
>> b. Check if ABMC is enabled. By default, ABMC feature is disabled.
>> Monitoring works in legacy monitor mode when ABMC is not enabled.
>>
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>> 0
>>
>
> With the introduction of "mbm_assign_enable" the entry in mon_features seems
> to provide duplicate information.
>
>> c. There will be new file "monitor_state" for each monitor group when ABMC
>> feature is supported. However, monitor_state is not available if ABMC is
>> disabled.
>>
>> #cat /sys/fs/resctrl/monitor_state
>> Unsupported
>
> This sounds potentially confusing since users will still be able to monitor
> the groups ...
>
>>
>> d. Read the event mbm_total_bytes and mbm_local_bytes. Without ABMC
>> enabled, monitoring will work in current mode without assignment option.
>>
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> 779247936
>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> 765207488
>>
>> e. Enable ABMC mode.
>>
>> #echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>> 1
>>
>> f. Read the monitor states. By default, both total and local MBM
>> events are in "unassign" state.
>>
>> #cat /sys/fs/resctrl/monitor_state
>> total=unassign;local=unassign
>
> This interface does not seem to take into account that hardware
> can support assignment per domain. I understand that this is
> not something you want to implement at this time but the user interface
> has to accommodate such an enhancement. This was already mentioned, and
> you did acknowledge the point [3] to this new version that does not
> reflect this is unexpected.
>
> My previous suggestions do seem to still stand and and I also am not able to
> see how Peter's requests [2] were considered. This same interface needs to
> accommodate usages apart from ABMC. For example, how to use this interface
> to address the same counter issue on AMD hardware without ABMC, and MPAM
> (pending James's feedback).
>
> I understand that until we hear from Arm we do not know all the requirements
> that this interface needs to support, but I do expect this interface to
> at least consider requirements and usage scenarios that are already known.
>
>> g. Read the event mbm_total_bytes and mbm_local_bytes. In ABMC mode,
>> the MBA events are not available until the user assigns the events
>> explicitly.
>>
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>> Unsupported
>>
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>> Unsupported
>>
>
> This needs some more thought to accommodate Peter's scenario where the counter
> can be expected to return the final count after the counter is disabled.
>
>> h. The event llc_occupancy is not affected by ABMC mode. Users can still
>> read the llc_occupancy.
>>
>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/llc_occupancy
>> 557056
>>
>> i. Now assign the total event and read the monitor_state.
>>
>> #echo total=assign > /sys/fs/resctrl/monitor_state
>> #cat /sys/fs/resctrl/monitor_state
>> total=assign;local=unassign
>>
>
> I do not see the "global assign/unassign" scenario addressed.
>
> This version seems to ignore (without discussion) a lot of earlier
> feedback.
>
> Reinette
>
> [1] https://lore.kernel.org/lkml/5ce67d8f-e207-4029-8fb3-0bc7deab1e9f@xxxxxxx/
> [2] https://lore.kernel.org/lkml/CALPaoCiRD6j_Rp7ffew+PtGTF4rWDORwbuRQqH2i-cY5SvWQBg@xxxxxxxxxxxxxx/
> [3] https://lore.kernel.org/lkml/38421428-84cb-b67e-f3ce-b7a0233e016b@xxxxxxx/