Re: [PATCH 5/6] selftests/resctrl: Do not compare performance counters and resctrl at low bandwidth

From: Ilpo Järvinen
Date: Fri Aug 30 2024 - 07:42:59 EST


On Thu, 29 Aug 2024, Reinette Chatre wrote:

> The MBA test incrementally throttles memory bandwidth, each time
> followed by a comparison between the memory bandwidth observed
> by the performance counters and resctrl respectively.
>
> While a comparison between performance counters and resctrl is
> generally appropriate, they do not have an identical view of
> memory bandwidth. For example RAS features or memory performance
> features that generate memory traffic may drive accesses that are
> counted differently by performance counters and MBM respectively,
> for instance generating "overhead" traffic which is not counted
> against any specific RMID. As a ratio, this different view of memory
> bandwidth becomes more apparent at low memory bandwidths.

Interesting.

I did some time back prototype with a change to MBM test such that instead
of using once=false I changed fill_buf to be able to run N passes through
the buffer which allowed me to know how many reads were performed by the
benchmark. This yielded numerical difference between all those 3 values
(# of reads, MBM, perf) which also varied from arch to another so it
didn't end up making an usable test.

I guess I now have an explanation for at least a part of the differences.

> It is not practical to enable/disable the various features that
> may generate memory bandwidth to give performance counters and
> resctrl an identical view. Instead, do not compare performance
> counters and resctrl view of memory bandwidth when the memory
> bandwidth is low.
>
> Bandwidth throttling behaves differently across platforms
> so it is not appropriate to drop measurement data simply based
> on the throttling level. Instead, use a threshold of 750MiB
> that has been observed to support adequate comparison between
> performance counters and resctrl.
>
> Signed-off-by: Reinette Chatre <reinette.chatre@xxxxxxxxx>
> ---
> tools/testing/selftests/resctrl/mba_test.c | 7 +++++++
> tools/testing/selftests/resctrl/resctrl.h | 6 ++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/tools/testing/selftests/resctrl/mba_test.c b/tools/testing/selftests/resctrl/mba_test.c
> index cad473b81a64..204b9ac4b108 100644
> --- a/tools/testing/selftests/resctrl/mba_test.c
> +++ b/tools/testing/selftests/resctrl/mba_test.c
> @@ -96,6 +96,13 @@ static bool show_mba_info(unsigned long *bw_imc, unsigned long *bw_resc)
>
> avg_bw_imc = sum_bw_imc / (NUM_OF_RUNS - 1);
> avg_bw_resc = sum_bw_resc / (NUM_OF_RUNS - 1);
> + if (avg_bw_imc < THROTTLE_THRESHOLD || avg_bw_resc < THROTTLE_THRESHOLD) {
> + ksft_print_msg("Bandwidth below threshold (%d MiB). Dropping results from MBA schemata %u.\n",
> + THROTTLE_THRESHOLD,
> + ALLOCATION_MAX - ALLOCATION_STEP * allocation);

The second one too should be %d.

--
i.

> + break;
> + }
> +
> avg_diff = (float)labs(avg_bw_resc - avg_bw_imc) / avg_bw_imc;
> avg_diff_per = (int)(avg_diff * 100);
>
> diff --git a/tools/testing/selftests/resctrl/resctrl.h b/tools/testing/selftests/resctrl/resctrl.h
> index 0e5456165a6a..e65c5fb76b17 100644
> --- a/tools/testing/selftests/resctrl/resctrl.h
> +++ b/tools/testing/selftests/resctrl/resctrl.h
> @@ -43,6 +43,12 @@
>
> #define DEFAULT_SPAN (250 * MB)
>
> +/*
> + * Memory bandwidth (in MiB) below which the bandwidth comparisons
> + * between iMC and resctrl are considered unreliable.
> + */
> +#define THROTTLE_THRESHOLD 750
> +
> /*
> * user_params: User supplied parameters
> * @cpu: CPU number to which the benchmark will be bound to
>