Re: [PATCH v9 13/13] Documentation/x86: Update resctrl.rst for new features

From: Reinette Chatre
Date: Thu Dec 15 2022 - 13:33:05 EST


Hi Babu,

On 12/1/2022 7:37 AM, Babu Moger wrote:
> Update the documentation for the new features:
> 1. Slow Memory Bandwidth allocation (SMBA).
> With this feature, the QOS enforcement policies can be applied
> to the external slow memory connected to the host. QOS enforcement
> is accomplished by assigning a Class Of Service (COS) to a processor
> and specifying allocations or limits for that COS for each resource
> to be allocated.
>
> 2. Bandwidth Monitoring Event Configuration (BMEC).
> The bandwidth monitoring events mbm_total_bytes and mbm_local_bytes
> are set to count all the total and local reads/writes respectively.
> With the introduction of slow memory, the two counters are not
> enough to count all the different types of memory events. With the
> feature BMEC, the users have the option to configure mbm_total_bytes
> and mbm_local_bytes to count the specific type of events.
>
> Also add configuration instructions with examples.
>
> Signed-off-by: Babu Moger <babu.moger@xxxxxxx>
> Reviewed-by: Bagas Sanjaya <bagasdotme@xxxxxxxxx>
> ---
> Documentation/x86/resctrl.rst | 138 ++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 136 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/x86/resctrl.rst b/Documentation/x86/resctrl.rst
> index 71a531061e4e..60761a6f9087 100644
> --- a/Documentation/x86/resctrl.rst
> +++ b/Documentation/x86/resctrl.rst
> @@ -17,14 +17,16 @@ AMD refers to this feature as AMD Platform Quality of Service(AMD QoS).
> This feature is enabled by the CONFIG_X86_CPU_RESCTRL and the x86 /proc/cpuinfo
> flag bits:
>
> -============================================= ================================
> +=============================================== ================================
> RDT (Resource Director Technology) Allocation "rdt_a"
> CAT (Cache Allocation Technology) "cat_l3", "cat_l2"
> CDP (Code and Data Prioritization) "cdp_l3", "cdp_l2"
> CQM (Cache QoS Monitoring) "cqm_llc", "cqm_occup_llc"
> MBM (Memory Bandwidth Monitoring) "cqm_mbm_total", "cqm_mbm_local"
> MBA (Memory Bandwidth Allocation) "mba"
> -============================================= ================================
> +SMBA (Slow Memory Bandwidth Allocation) "smba"
> +BMEC (Bandwidth Monitoring Event Configuration) "bmec"
> +=============================================== ================================
>
> To use the feature mount the file system::
>
> @@ -161,6 +163,79 @@ with the following files:
> "mon_features":
> Lists the monitoring events if
> monitoring is enabled for the resource.
> + Example::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mon_features
> + llc_occupancy
> + mbm_total_bytes
> + mbm_local_bytes
> +
> + If the system supports Bandwidth Monitoring Event
> + Configuration (BMEC), then the bandwidth events will
> + be configurable. The output will be::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mon_features
> + llc_occupancy
> + mbm_total_bytes
> + mbm_total_bytes_config
> + mbm_local_bytes
> + mbm_local_bytes_config
> +
> +"mbm_total_bytes_config", "mbm_local_bytes_config":
> + These files contain the current event configuration for the events

"These files" is redundant. Note that this is already introduced with "the
following files:".
To match similar files it could read:
"Read/write files containing the configuration for the mbm_total_bytes and
mbm_local_bytes events, respectively, ..."

> + mbm_total_bytes and mbm_local_bytes, respectively, when the
> + Bandwidth Monitoring Event Configuration (BMEC) feature is supported.
> + The event configuration settings are domain specific and will affect

"will" can be dropped?

> + all the CPUs in the domain.
> +
> + Following are the types of events supported:
> +
> + ==== ========================================================
> + Bits Description
> + ==== ========================================================
> + 6 Dirty Victims from the QOS domain to all types of memory
> + 5 Reads to slow memory in the non-local NUMA domain
> + 4 Reads to slow memory in the local NUMA domain
> + 3 Non-temporal writes to non-local NUMA domain
> + 2 Non-temporal writes to local NUMA domain
> + 1 Reads to memory in the non-local NUMA domain
> + 0 Reads to memory in the local NUMA domain
> + ==== ========================================================
> +
> + By default, the mbm_total_bytes configuration is set to 0x7f to count
> + all the event types and the mbm_local_bytes configuration is set to
> + 0x15 to count all the local memory events.
> +
> + Examples:
> +
> + * To view the current configuration::
> + ::
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> + 0=0x7f;1=0x7f;2=0x7f;3=0x7f
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> + 0=0x15;1=0x15;3=0x15;4=0x15
> +
> + * To change the mbm_total_bytes to count only reads on domain 0,
> + the bits 0, 1, 4 and 5 needs to be set, which is 110011b in binary
> + (in hexadecimal 0x33):
> + ::
> +
> + # echo "0=0x33" > /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_total_bytes_config
> + 0=0x33;1=0x7f;2=0x7f;3=0x7f
> +
> + * To change the mbm_local_bytes to count all the slow memory reads on
> + domain 0 and 1, the bits 4 and 5 needs to be set, which is 110000b
> + in binary (in hexadecimal 0x30):
> + ::
> +
> + # echo "0=0x30;1=0x30" > /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> +
> + # cat /sys/fs/resctrl/info/L3_MON/mbm_local_bytes_config
> + 0=0x30;1=0x30;3=0x15;4=0x15
>
> "max_threshold_occupancy":
> Read/write file provides the largest value (in
> @@ -464,6 +539,25 @@ Memory bandwidth domain is L3 cache.
>
> MB:<cache_id0>=bw_MBps0;<cache_id1>=bw_MBps1;...
>
> +Slow Memory Bandwidth Allocation (SMBA)
> +---------------------------------------
> +AMD hardware supports Slow Memory Bandwidth Allocation (SMBA).
> +CXL.memory is the only supported "slow" memory device. With the
> +support of SMBA, the hardware enables bandwidth allocation on
> +the slow memory devices. If there are multiple such devices in
> +the system, the throttling logic groups all the slow sources
> +together and applies the limit on them as a whole.
> +
> +The presence of SMBA (with CXL.memory) is independent of slow memory
> +devices presence. If there are no such devices on the system, then
> +configuring SMBA will have no impact on the performance of the system.
> +
> +The bandwidth domain for slow memory is L3 cache. Its schemata file
> +is formatted as:
> +::
> +
> + SMBA:<cache_id0>=bandwidth0;<cache_id1>=bandwidth1;...
> +
> Reading/writing the schemata file
> ---------------------------------
> Reading the schemata file will show the state of all resources
> @@ -479,6 +573,46 @@ which you wish to change. E.g.
> L3DATA:0=fffff;1=fffff;2=3c0;3=fffff
> L3CODE:0=fffff;1=fffff;2=fffff;3=fffff
>
> +Reading/writing the schemata file (on AMD systems)
> +--------------------------------------------------
> +Reading the schemata file will show the current bandwidth limit on all
> +domains. The allocated resources are in multiples of one eighth GB/s.
> +When writing to the file, you need to specify what cache id you wish to
> +configure the bandwidth limit.
> +
> +For example, to allocate 2GB/s limit on the first cache id:
> +
> +::
> +
> + # cat schemata
> + MB:0=2048;1=2048;2=2048;3=2048
> + L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> + # echo "MB:1=16" > schemata
> + # cat schemata
> + MB:0=2048;1= 16;2=2048;3=2048
> + L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> +Reading/writing the schemata file (on AMD systems) with SMBA feature
> +--------------------------------------------------------------------
> +Reading and writing the schemata file is the same as without SMBA in
> +above section.
> +
> +For example, to allocate 8GB/s limit on the first cache id:
> +
> +::
> +
> + # cat schemata
> + SMBA:0=2048;1=2048;2=2048;3=2048
> + MB:0=2048;1=2048;2=2048;3=2048
> + L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> + # echo "SMBA:1=64" > schemata
> + # cat schemata
> + SMBA:0=2048;1= 64;2=2048;3=2048
> + MB:0=2048;1=2048;2=2048;3=2048
> + L3:0=ffff;1=ffff;2=ffff;3=ffff
> +
> Cache Pseudo-Locking
> ====================
> CAT enables a user to specify the amount of cache space that an
>
>

Based on earlier comments I am awaiting information to understand if some
more detail/example is needed to describe to the user what can be expected
after a counter configuration is made.

Reinette