Re: [PATCH v2 00/17] x86/resctrl : Support AMD Assignable Bandwidth Monitoring Counters (ABMC)

From: Moger, Babu
Date: Fri Feb 02 2024 - 16:58:12 EST


Hi Reinette,

On 2/1/2024 10:09 PM, Reinette Chatre wrote:
Hi Babu,

On 1/19/2024 10:22 AM, Babu Moger wrote:
These series adds the support for Assignable Bandwidth Monitoring Counters
Not a good start ([1]).

Yea. My bad.


(ABMC). It is also called QoS RMID Pinning feature

The feature details are documented in the APM listed below [1].
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3.3 Assignable Bandwidth
Monitoring (ABMC). The documentation is available at
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537

The patches are based on top of commit
1ac6b49423e83af2abed9be7fbdf2e491686c66b (tip/master)

# Introduction

AMD hardware can support 256 or more RMIDs. However, bandwidth monitoring
feature only guarantees that RMIDs currently assigned to a processor will
be tracked by hardware. The counters of any other RMIDs which are no longer
being tracked will be reset to zero. The MBM event counters return
"Unavailable" for the RMIDs that are not active.
Users can create 256 or more monitor groups. But there can be only limited
number of groups that can be give guaranteed monitoring numbers. With ever
"can be given"?

"can give guaranteed monitoring numbers."

I feel this looks better.


changing configurations there is no way to definitely know which of these
groups will be active for certain point of time. Users do not have the
option to monitor a group or set of groups for certain period of time
without worrying about RMID being reset in between.
The ABMC feature provides an option to the user to assign an RMID to the
hardware counter and monitor the bandwidth for a longer duration.
The assigned RMID will be active until the user unassigns it manually.
There is no need to worry about counters being reset during this period.
Additionally, the user can specify a bitmask identifying the specific
bandwidth types from the given source to track with the counter.

Without ABMC enabled, monitoring will work in current mode without
assignment option.

# Linux Implementation

Linux resctrl subsystem provides the interface to count maximum of two
memory bandwidth events per group, from a combination of available total
and local events. Keeping the current interface, users can assign a maximum
of 2 ABMC counters per group. User will also have the option to assign only
one counter to the group. If the system runs out of assignable ABMC
counters, kernel will display an error. Users need to unassign an already
assigned counter to make space for new assignments.


# Examples

a. Check if ABMC support is available
#mount -t resctrl resctrl /sys/fs/resctrl/

#cat /sys/fs/resctrl/info/L3_MON/mon_features
llc_occupancy
mbm_total_bytes
mbm_total_bytes_config
mbm_local_bytes
mbm_local_bytes_config
mbm_assign_capable ← Linux kernel detected ABMC feature

b. Check if ABMC is enabled. By default, ABMC feature is disabled.
Monitoring works in legacy monitor mode when ABMC is not enabled.

#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
0

With the introduction of "mbm_assign_enable" the entry in mon_features seems
to provide duplicate information.

ok. We can remove the text in mon_features and keep mbm_assign_enable. We need this to enable and disable the feature.


c. There will be new file "monitor_state" for each monitor group when ABMC
feature is supported. However, monitor_state is not available if ABMC is
disabled.

#cat /sys/fs/resctrl/monitor_state
Unsupported
This sounds potentially confusing since users will still be able to monitor
the groups ...
How about "Assignment-Unsupported"?


d. Read the event mbm_total_bytes and mbm_local_bytes. Without ABMC
enabled, monitoring will work in current mode without assignment option.

# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
779247936
# cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
765207488

e. Enable ABMC mode.

#echo 1 > /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
#cat /sys/fs/resctrl/info/L3_MON/mbm_assign_enable
1

f. Read the monitor states. By default, both total and local MBM
events are in "unassign" state.

#cat /sys/fs/resctrl/monitor_state
total=unassign;local=unassign
This interface does not seem to take into account that hardware
can support assignment per domain. I understand that this is
not something you want to implement at this time but the user interface
has to accommodate such an enhancement. This was already mentioned, and
you did acknowledge the point [3] to this new version that does not
reflect this is unexpected.

Yea. Domain level assignment is not supported at this point. Do you want me to explicitly mention here?

Please elaborate what you meant here.


My previous suggestions do seem to still stand and and I also am not able to
see how Peter's requests [2] were considered. This same interface needs to
accommodate usages apart from ABMC. For example, how to use this interface
to address the same counter issue on AMD hardware without ABMC, and MPAM
(pending James's feedback).

Yea. Agree. Peter's comments are not addressed. I am not all clear about details of Peters and James requirement.

With respect to ABMC here are my requirements.

a.  Assignment needs to be done at group level.

b. User should be able to assign each event individually. Multiple events assignment(in one command) should be supported.

c. I have no plans to implement domain level assignment. It is done at system level.

d. We need only couple of states.  Assigned and unassigned.

e. monitor_state is name of file for user interface. We can change that based on comments.

Peter, James,

Please comment on what you want achieve in "assignment" based on the features you are working on.

Do you want to add new states?


I understand that until we hear from Arm we do not know all the requirements
that this interface needs to support, but I do expect this interface to
at least consider requirements and usage scenarios that are already known.

Sure. Will try that in the next version. Lets continue the discussion.


g. Read the event mbm_total_bytes and mbm_local_bytes. In ABMC mode,
the MBA events are not available until the user assigns the events
explicitly.

#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_total_bytes
Unsupported

#cat /sys/fs/resctrl/mon_data/mon_L3_00/mbm_local_bytes
Unsupported

This needs some more thought to accommodate Peter's scenario where the counter
can be expected to return the final count after the counter is disabled.

I am not sure how to achieve this with ABMC. This may be applicable to soft rmid only. In case of "soft rmid", previous readings are saved in the soft rmid state.


h. The event llc_occupancy is not affected by ABMC mode. Users can still
read the llc_occupancy.

#cat /sys/fs/resctrl/mon_data/mon_L3_00/llc_occupancy
557056

i. Now assign the total event and read the monitor_state.

#echo total=assign > /sys/fs/resctrl/monitor_state
#cat /sys/fs/resctrl/monitor_state
total=assign;local=unassign

I do not see the "global assign/unassign" scenario addressed.

I am not all clear on meaning of "global assign/unassign".  Does it mean looping thru all the groups and assign the RMIDs?

It may not work in many cases. In case of ABMC, we have only limited number of hw counters. It will fail after hardware runs out of counters. It is better done selectively based on which group user is interested in.

But it can be done later if we find a use case for that.


This version seems to ignore (without discussion) a lot of earlier
feedback.

Please feel free comment. There are various threads of discussion. I may have missed.

Thanks

Babu


Reinette

[1] https://lore.kernel.org/lkml/5ce67d8f-e207-4029-8fb3-0bc7deab1e9f@xxxxxxx/
[2] https://lore.kernel.org/lkml/CALPaoCiRD6j_Rp7ffew+PtGTF4rWDORwbuRQqH2i-cY5SvWQBg@xxxxxxxxxxxxxx/
[3] https://lore.kernel.org/lkml/38421428-84cb-b67e-f3ce-b7a0233e016b@xxxxxxx/