Re: [PATCH] Documentation/x86: Document resctrl bandwidth control units are MiB
From: Reinette Chatre
Date: Mon Apr 01 2024 - 18:44:23 EST
Hi Tony,
On 3/29/2024 9:37 AM, Reinette Chatre wrote:
> On 3/29/2024 8:31 AM, Tony Luck wrote:
>> On Thu, Mar 28, 2024 at 06:01:33PM -0700, Reinette Chatre wrote:
>>> On 3/22/2024 11:20 AM, Tony Luck wrote:
>>>> The memory bandwidth software controller uses 2^20 units rather than
>>>> 10^6. See mbm_bw_count() which computes bandwidth using the "SZ_1M"
>>>> Linux define for 0x00100000.
>>>>
>>>> Update the documentation to use MiB when describing this feature.
>>>> It's too late to fix the mount option "mba_MBps" as that is now an
>>>> established user interface.
>>>
>>> I see that this is merged already but I do not think this is correct.
>>
>> I was surprised that Ingo merged it without giving folks a chance to
>> comment.
>>
>>> Shouldn't the implementation be fixed instead? Looking at the implementation
>>> the intent appears to be clear that the goal is to have bandwidth be
>>> MBps .... that is when looking from documentation to the define
>>> (MBA_MAX_MBPS) to the comments of the function you reference above
>>> mbm_bw_count(). For example, "...and delta bandwidth in MBps ..."
>>> and "...maintain values in MBps..."
>>
>> Difficult to be sure of intent. But in general when people talk about
>> "megabytes" in the context of memory they mean 2^20. Storage capacity
>> on computers was originally in 2^20 units until the marketing teams
>> at disk drive manufacturers realized they could print numbers 4.8% bigger
>> on their products by using SI unit 10^6 Mega prefix (rising to 7.3% with
>> Giga and 10% with Tera).
>
> This is not so obvious to me. I hear what you are saying about storage
> capacity but the topic here is memory bandwidth and here I find the custom
> to be that MB/s means 10^6 bytes per second. That is looking from how DDR
> bandwidth is documented to how benchmarks like
> https://github.com/intel/memory-bandwidth-benchmarks report the data, to
> what wikipedia says in https://en.wikipedia.org/wiki/Memory_bandwidth.
>
> I also took a sample of what the perf side of things may look like
> and, for example, when looking at;
> tools/perf/pmu-events/arch/x86/sapphirerapids/spr-metrics.json
> I understand that the custom for bandwidth is MB/s. For example:
>
> {
> "BriefDescription": "DDR memory read bandwidth (MB/sec)",
> "MetricExpr": "UNC_M_CAS_COUNT.RD * 64 / 1e6 / duration_time",
> "MetricName": "memory_bandwidth_read",
> "ScaleUnit": "1MB/s"
> },
>
(Thanks to Kan Liang for explaining this to me.)
As an update on this, the perf side does not seem to be as consistent as
I first interpreted it to be. There appears to be a "kernel side" and
"user side" related to memory bandwidth data.
On the kernel side, users can refer directly to:
/sys/bus/event_source/devices/uncore_imc_*/events to read the
UNC_M_CAS_COUNT.RD and UNC_M_CAS_COUNT.WR data and this appears to
be intended to be consumed as MiB/s as per:
$ /sys/bus/event_source/devices/uncore_imc_0/events/cas_count_read.unit
MiB
On the user side, using perf from userspace the metrics are obtained
from the relevant json file as quoted above, and thus when using perf
from the command line the data is in MB/sec, for example:
$ perf list
[SNIP]
llc_miss_local_memory_bandwidth_read
[Bandwidth (MB/sec) of read requests that miss the last level cache (LLC) and go to local memory]
llc_miss_local_memory_bandwidth_write
[Bandwidth (MB/sec) of write requests that miss the last level cache (LLC) and go to local memory]
llc_miss_remote_memory_bandwidth_read
[Bandwidth (MB/sec) of read requests that miss the last level cache (LLC) and go to remote memory]
llc_miss_remote_memory_bandwidth_write
[Bandwidth (MB/sec) of write requests that miss the last level cache (LLC) and go to remote memory]
loads_per_instr
[The ratio of number of completed memory load instructions to the total number completed instructions]
memory_bandwidth_read
[DDR memory read bandwidth (MB/sec)]
memory_bandwidth_total
[DDR memory bandwidth (MB/sec)]
memory_bandwidth_write
[DDR memory write bandwidth (MB/sec)]
[SNIP]
It appears that there is no custom here and it may just be somebody's preference?
Reinette