Re: [PATCH v8 07/25] x86/resctrl: Introduce the interface to display monitor mode

From: Reinette Chatre
Date: Tue Oct 15 2024 - 23:12:46 EST


Hi Babu,

On 10/9/24 10:39 AM, Babu Moger wrote:
> Introduce the interface file "mbm_assign_mode" to list monitor modes
> supported.
>
> The "mbm_cntr_assign" mode provides the option to assign a counter to
> an RMID, event pair and monitor the bandwidth as long as it is assigned.
>
> On AMD systems "mbm_cntr_assign" is backed by the ABMC (Assignable
> Bandwidth Monitoring Counters) hardware feature and is enabled by default.
>
> The "default" mode is the existing monitoring mode that works without the
> explicit counter assignment, instead relying on dynamic counter assignment
> by hardware that may result in hardware not dedicating a counter resulting
> in monitoring data reads returning "Unavailable".
>
> Provide an interface to display the monitor mode on the system.
> $cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> [mbm_cntr_assign]
> default
>
> Switching the mbm_assign_mode will reset all the MBM counters of all
> resctrl groups.

Please note that this now contradicts the documentation. Perhaps this sentence
can just be dropped since there is the documentation within the patch.


> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
> ---

...

> diff --git a/Documentation/arch/x86/resctrl.rst b/Documentation/arch/x86/resctrl.rst
> index 30586728a4cd..e4a7d6e815f6 100644
> --- a/Documentation/arch/x86/resctrl.rst
> +++ b/Documentation/arch/x86/resctrl.rst
> @@ -257,6 +257,40 @@ with the following files:
> # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> 0=0x30;1=0x30;3=0x15;4=0x15
>
> +"mbm_assign_mode":
> + Reports the list of monitoring modes supported. The enclosed brackets
> + indicate which mode is enabled.
> + ::
> +
> + cat /sys/fs/resctrl/info/L3_MON/mbm_assign_mode
> + [mbm_cntr_assign]
> + default
> +
> + "mbm_cntr_assign":
> +
> + In mbm_cntr_assign mode user-space is able to specify which control
> + or monitor groups in resctrl should have a counter assigned using the

Counters cannot be assigned to control groups. How about replacing all instances
of "control and monitor groups" with "CTRL_MON and MON groups", similarly
"control or monitor groups" with "CTRL_MON or MON groups".

> + 'mbm_assign_control' file. The number of counters available is described

Looking at the rest of the doc it seems that the custom is actually to place
filenames in double quotes, like "mbm_assign_control".

> + in the 'num_mbm_cntrs' file. Changing the mode may cause all counters on
> + a resource to reset.
> +
> + The mode is useful on platforms which support more control and monitor
> + groups than hardware counters, meaning 'unassigned' control or monitor
> + groups will report 'Unavailable' or count the traffic in an unpredictable
> + way.

Note two more instances of "control groups" above.

Please note that the above description implies that counter assignment is per-group. For
example, "specify which control or monitor groups in resctrl should have a counter
assigned" and "useful on platforms which support more control and monitor groups
than hardware counters". This needs to be reworked to reflect that counters
are assigned to events.

> +
> + AMD Platforms with ABMC (Assignable Bandwidth Monitoring Counters) feature
> + enable this mode by default so that counters remain assigned even when the
> + corresponding RMID is not in use by any processor.

I assume this should remain RMID since this specifically talks about an x86 system?

> +
> + "default":
> +
> + By default resctrl assumes each control and monitor group has a hardware
> + counter. Hardware that does not support 'mbm_cntr_assign' mode will still
> + allow more control or monitor groups than 'num_rmids' to be created. In
> + that case reading the mbm_total_bytes and mbm_local_bytes may report
> + 'Unavailable' if there is no counter associated with that group.
> +

I reconsidered my earlier suggestion and I believe it needs a correction since
counter assignment is not per group:

In default mode resctrl assumes there is a hardware counter for each
event within every CTRL_MON and MON group. Reading mbm_total_bytes or
mbm_local_bytes may report 'Unavailable' if there is no counter associated
with that event.

Please feel free to improve.

> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> bytes) at which a previously used LLC_occupancy

The code change looks good to me.

Reinette