Re: [PATCH v6 07/22] x86/resctrl: Introduce the interface to display monitor mode

From: James Morse
Date: Fri Aug 16 2024 - 12:56:42 EST


Hi Babu,

On 06/08/2024 23:00, Babu Moger wrote:
> The mbm_mode displays list of monitor modes supported.
>
> The mbm_cntr_assign is one of the currently supported modes. It is also
> called ABMC (Assignable Bandwidth Monitoring Counters) feature. ABMC
> feature provides option to assign a hardware counter to an RMID and
> monitor the bandwidth as long as it is assigned. ABMC mode is enabled
> by default when supported.
>
> Legacy mode works without the assignment option.
>
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> [mbm_cntr_assign]
> legacy
>
> Switching the mbm_mode will reset all the mbm counters of all resctrl
> groups.

> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 30586728a4cd..d4ec605b200a 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,40 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x30;1=0x30;3=0x15;4=0x15
>
> +"mbm_mode":
> + Reports the list of assignable monitoring features supported. The
> + enclosed brackets indicate which feature is enabled.
> + ::
> +
> + cat /sys/fs/resctrl/info/L3_MON/mbm_mode
> + [mbm_cntr_assign]
> + legacy
> +
> + "mbm_cntr_assign":
> + AMD's ABMC feature is one of the mbm_cntr_assign mode supported.
> + The bandwidth monitoring feature on AMD system only guarantees
> + that RMIDs currently assigned to a processor will be tracked by
> + hardware. The counters of any other RMIDs which are no longer
> + being tracked will be reset to zero. The MBM event counters
> + return "Unavailable" for the RMIDs that are not tracked by
> + hardware. So, there can be only limited number of groups that can
> + give guaranteed monitoring numbers. With ever changing configurations
> + there is no way to definitely know which of these groups are being
> + tracked for certain point of time. Users do not have the option to
> + monitor a group or set of groups for certain period of time without
> + worrying about RMID being reset in between.
> +
> + The ABMC feature provides an option to the user to assign a hardware
> + counter to an RMID and monitor the bandwidth as long as it is assigned.
> + The assigned RMID will be tracked by the hardware until the user
> + unassigns it manually. There is no need to worry about counters being
> + reset during this period.

While debugging my rebase of MPAM on top of this series, I've come back to this wording a
few times to try and work out what I should expect to see ...

Is it possible to disentangle the AMD hardware feature description from the description of
the filesystem behaviour this enables? You are really describing what the hardware does if
you don't enable this mode...

An incomplete suggestion of the shape would be something like:

| In mbm_cntr_assign mode user-space is able to specify which control
| or monitor groups in resctrl should have a hardware counter assigned
| using the 'mbm_control' file. The number of hardware counters available
| is described in the 'num_mbm_cntrs' file.
| Changing this mode will cause all counters on a resource to reset.
|
| The feature is needed on platforms which support more control and monitor
| groups than hardware counters, meaning 'unassigned' control or monitor groups will
| report 'Unavailable' or not count all the traffic in an unpredictable way.
|
| Platforms with AMDs ABMC feature enable this mode by default so that counters
| remain assigned even when the corresponding RMID is not in use by any processor.


> + "Legacy":

Calling "enough hardware counters" 'legacy' is a bit curious.... 'default'?
(but I haven't worked out the benefit of disabling this mode, so maybe it doesn't need a
name.)

> + Legacy mode works without the assignment option. The monitoring works
> + as long as there are enough RMID counters available to support number
> + of monitoring groups.

How can user-space tell this is the case? Could we be specific as to what 'works' means?

Something like:
| By default resctrl assumes each control and monitor group has a hardware counter.
| Hardware without this property will still allow more control or monitor groups
| than 'num_mbm_cntrs' to be created. Reading the mbm files may report
| 'Unavailable' if there is no hardware resource assigned.


N.B. I don't suggest referring to the num_rmid file in these as MPAM doesn't have an
equivalent property.


Thanks,

James