Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)
From: Reinette Chatre
Date: Mon Feb 05 2024 - 17:38:54 EST
Hi Babu,
On 2/2/2024 1:57 PM, Moger, Babu wrote:
> On 2/1/2024 10:09 PM, Reinette Chatre wrote:
>> On 1/19/2024 10:22 AM, Babu Moger wrote:
>>> These series adds the support for Assignable Bandwidth Monitoring Counters
>> Not a good start ([1]).
>
> Yea. My bad.
>
>>
>>> (ABMC). It is also called QoS RMID Pinning feature
>>>
>>> The feature details are documented in the APM listed below [1].
>>> [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
>>> Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
>>> Monitoring (ABMC). The documentation is available at
>>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537
>>>
>>> The patches are based on top of commit
>>> 1ac6b49423e83af2abed9be7fbdf2e491686c66b (tip/master)
>>>
>>> # Introduction
>>>
>>> AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
>>> feature only guarantees that RMIDs currently assigned to a processor will
>>> be tracked by hardware. The counters of any other RMIDs which are no longer
>>> being tracked will be reset to zero. The MBM event counters return
>>> "Unavailable" for the RMIDs that are not active.
>>> Users can create 256 or more monitor groups. But there can be only limited
>>> number of groups that can be give guaranteed monitoring numbers. With ever
>> "can be given"?
>
> "can give guaranteed monitoring numbers."
>
> I feel this looks better.
Sounds good. Thank you.
>
>>
>>> changing configurations there is no way to definitely know which of these
>>> groups will be active for certain point of time. Users do not have the
>>> option to monitor a group or set of groups for certain period of time
>>> without worrying about RMID being reset in between.
>>> The ABMC feature provides an option to the user to assign an RMID to the
>>> hardware counter and monitor the bandwidth for a longer duration.
>>> The assigned RMID will be active until the user unassigns it manually.
>>> There is no need to worry about counters being reset during this period.
>>> Additionally, the user can specify a bitmask identifying the specific
>>> bandwidth types from the given source to track with the counter.
>>>
>>> Without ABMC enabled, monitoring will work in current mode without
>>> assignment option.
>>>
>>> # Linux Implementation
>>>
>>> Linux resctrl subsystem provides the interface to count maximum of two
>>> memory bandwidth events per group, from a combination of available total
>>> and local events. Keeping the current interface, users can assign a maximum
>>> of 2 ABMC counters per group. User will also have the option to assign only
>>> one counter to the group. If the system runs out of assignable ABMC
>>> counters, kernel will display an error. Users need to unassign an already
>>> assigned counter to make space for new assignments.
>>>
>>>
>>> # Examples
>>>
>>> a. Check if ABMC support is available
>>> #mount -t resctrl resctrl /sys/fs/resctrl/
>>>
>>> #cat /sys/fs/resctrl/info/L3_MON/mon_features
>>> llc_occupancy
>>> mbm_total_bytes
>>> mbm_total_bytes_config
>>> mbm_local_bytes
>>> mbm_local_bytes_config
>>> mbm_assign_capable ← Linux kernel detected ABMC feature
>>>
>>> b. Check if ABMC is enabled. By default, ABMC feature is disabled.
>>> Monitoring works in legacy monitor mode when ABMC is not enabled.
>>>
>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>> 0
>>>
>> With the introduction of "mbm_assign_enable" the entry in mon_features seems
>> to provide duplicate information.
>
> ok. We can remove the text in mon_features and keep mbm_assign_enable. We need this to enable and disable the feature.
This could be improved beyond a binary "enable"/"disable" interface to user space.
For example, the hardware can discover which "mbm counter assign" related feature
(I'm counting the "soft RMID" here as one of the "mbm counter assign" related
features) is supported on the platform and it can be presented to the user like:
# cat /sys/fs/resctrl/info/L3_MON/mbm_assign
[feature_1] feature_2 feature_3
The output indicates which features are supported by the platform and the brackets indicate
which feature is enabled.
>>> c. There will be new file "monitor_state" for each monitor group when ABMC
>>> feature is supported. However, monitor_state is not available if ABMC is
>>> disabled.
>>>
>>> #cat /sys/fs/resctrl/monitor_state
>>> Unsupported
>> This sounds potentially confusing since users will still be able to monitor
>> the groups ...
> How about "Assignment-Unsupported"?
(please see later)
>>
>>>
>>> d. Read the event mbm_total_bytes and mbm_local_bytes. Without ABMC
>>> enabled, monitoring will work in current mode without assignment option.
>>>
>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>> 779247936
>>> # cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>> 765207488
>>>
>>> e. Enable ABMC mode.
>>>
>>> #echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>> #cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
>>> 1
>>>
>>> f. Read the monitor states. By default, both total and local MBM
>>> events are in "unassign" state.
>>>
>>> #cat /sys/fs/resctrl/monitor_state
>>> total=unassign;local=unassign
>> This interface does not seem to take into account that hardware
>> can support assignment per domain. I understand that this is
>> not something you want to implement at this time but the user interface
>> has to accommodate such an enhancement. This was already mentioned, and
>> you did acknowledge the point [3] to this new version that does not
>> reflect this is unexpected.
>
> Yea. Domain level assignment is not supported at this point. Do you want me to explicitly mention here?
>
> Please elaborate what you meant here.
You have made it clear on several occasions that you do not intend to support
domain level assignment. That may be ok but the interface you create should
not prevent future support of domain level assignment.
If my point is not clear, could you please share how this interface is able to
support domain level assignment in the future?
I am starting to think that we need a file similar to the schemata file
for group and domain level monitor configurations.
>> My previous suggestions do seem to still stand and and I also am not able to
>> see how Peter's requests [2] were considered. This same interface needs to
>> accommodate usages apart from ABMC. For example, how to use this interface
>> to address the same counter issue on AMD hardware without ABMC, and MPAM
>> (pending James's feedback).
>
> Yea. Agree. Peter's comments are not addressed. I am not all clear
> about details of Peters and James requirement.
Peter listed his requirements in [1]. That email thread is a worthwhile read
for the use cases.
I believe that James is aware of this work and do hope to hear from him.
>
> With respect to ABMC here are my requirements.
>
> a. Assignment needs to be done at group level.
>
> b. User should be able to assign each event individually. Multiple events assignment(in one command) should be supported.
>
> c. I have no plans to implement domain level assignment. It is done at system level.
>
> d. We need only couple of states. Assigned and unassigned.
>
> e. monitor_state is name of file for user interface. We can change that based on comments.
>
> Peter, James,
>
> Please comment on what you want achieve in "assignment" based on the features you are working on.
>
> Do you want to add new states?
>
>>
>> I understand that until we hear from Arm we do not know all the requirements
>> that this interface needs to support, but I do expect this interface to
>> at least consider requirements and usage scenarios that are already known.
>
> Sure. Will try that in the next version. Lets continue the discussion.
>
>
>>> g. Read the event mbm_total_bytes and mbm_local_bytes. In ABMC mode,
>>> the MBA events are not available until the user assigns the events
>>> explicitly.
>>>
>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
>>> Unsupported
>>>
>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
>>> Unsupported
>>>
>> This needs some more thought to accommodate Peter's scenario where the counter
>> can be expected to return the final count after the counter is disabled.
>
> I am not sure how to achieve this with ABMC. This may be applicable
> to soft rmid only. In case of "soft rmid", previous readings are
> saved in the soft rmid state.
Right. Please consider this work in two parts, first, there is a generic
interface that aims to support ABMC, "soft RMID", and MPAM. Second, there
is using this interface to support ABMC.
>>> h. The event llc_occupancy is not affected by ABMC mode. Users can still
>>> read the llc_occupancy.
>>>
>>> #cat /sys/fs/resctrl/mon_data/mon_L3_00/llc_occupancy
>>> 557056
>>>
>>> i. Now assign the total event and read the monitor_state.
>>>
>>> #echo total=assign > /sys/fs/resctrl/monitor_state
>>> #cat /sys/fs/resctrl/monitor_state
>>> total=assign;local=unassign
>>>
>> I do not see the "global assign/unassign" scenario addressed.
>
> I am not all clear on meaning of "global assign/unassign". Does it
> mean looping thru all the groups and assign the RMIDs?
Please see [1].
> It may not work in many cases. In case of ABMC, we have only limited
> number of hw counters. It will fail after hardware runs out of
> counters. It is better done selectively based on which group user is
> interested in.
Right. This is one more item where the generic interface needs to
accommodate different hardware implementations. Perhaps this could
be one of the "features" exposed by (global) mbm_assign that the
user can "enable"/"disable" on demand?
> But it can be done later if we find a use case for that.
There already exists a use case as presented by Peter in support
of AMD hardware without ABMC, no?
>> This version seems to ignore (without discussion) a lot of earlier
>> feedback.
>
> Please feel free comment. There are various threads of discussion. I may have missed.
>
Reinette
[1] https://lore.kernel.org/lkml/CALPaoCiRD6j_Rp7ffew+PtGTF4rWDORwbuRQqH2i-cY5SvWQBg@xxxxxxxxxxxxxx/