Re: [PATCH] perf test: Retry without grouping for all metrics test

From: Ian Rogers
Date: Wed Dec 06 2023 - 13:51:09 EST


On Wed, Dec 6, 2023 at 9:54 AM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
>
> Em Wed, Dec 06, 2023 at 08:35:23AM -0800, Ian Rogers escreveu:
> > On Wed, Dec 6, 2023 at 5:08 AM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
> > > Humm, I'm not being able to reproduce here the problem, before applying
> > > this patch:
>
> > Please don't apply the patch. The patch masks a bug in metrics/PMUs
>
> I didn't
>
> > and the proper fix was:
> > 8d40f74ebf21 perf vendor events amd: Fix large metrics
> > https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@xxxxxxx
>
> that is upstream:
>
> ⬢[acme@toolbox perf-tools-next]$ git log tools/perf/pmu-events/arch/x86/amdzen1/recommended.json
> commit 8d40f74ebf217d3b9e9b7481721e6236b857cc55
> Author: Sandipan Das <sandipan.das@xxxxxxx>
> Date: Thu Jul 6 12:04:40 2023 +0530
>
> perf vendor events amd: Fix large metrics
>
> There are cases where a metric requires more events than the number of
> available counters. E.g. AMD Zen, Zen 2 and Zen 3 processors have four
> data fabric counters but the "nps1_die_to_dram" metric has eight events.
>
> By default, the constituent events are placed in a group and since the
> events cannot be scheduled at the same time, the metric is not computed.
> The "all metrics" test also fails because of this.
>
> Use the NO_GROUP_EVENTS constraint for such metrics which anyway expect
> the user to run perf with "--metric-no-group".
>
> E.g.
>
> $ sudo perf test -v 101
>
> Before:
>
> 101: perf all metrics test :
> --- start ---
> test child forked, pid 37131
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Metric 'nps1_die_to_dram' not printed in:
> Error:
> Invalid event (dram_channel_data_controller_4) in per-thread mode, enable system wide with '-a'.
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with -1
> ---- end ----
> perf all metrics test: FAILED!
>
> After:
>
> 101: perf all metrics test :
> --- start ---
> test child forked, pid 43766
> Testing branch_misprediction_ratio
> Testing all_remote_links_outbound
> Testing nps1_die_to_dram
> Testing macro_ops_dispatched
> Testing all_l2_cache_accesses
> Testing all_l2_cache_hits
> Testing all_l2_cache_misses
> Testing ic_fetch_miss_ratio
> Testing l2_cache_accesses_from_l2_hwpf
> Testing l2_cache_misses_from_l2_hwpf
> Testing op_cache_fetch_miss_ratio
> Testing l3_read_miss_latency
> Testing l1_itlb_misses
> test child finished with 0
> ---- end ----
> perf all metrics test: Ok
>
> Reported-by: Ayush Jain <ayush.jain3@xxxxxxx>
> Suggested-by: Ian Rogers <irogers@xxxxxxxxxx>
> Signed-off-by: Sandipan Das <sandipan.das@xxxxxxx>
> Acked-by: Ian Rogers <irogers@xxxxxxxxxx>
> Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
> Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
> Cc: Ananth Narayan <ananth.narayan@xxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> Cc: Mark Rutland <mark.rutland@xxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Ravi Bangoria <ravi.bangoria@xxxxxxx>
> Cc: Santosh Shukla <santosh.shukla@xxxxxxx>
> Link: https://lore.kernel.org/r/20230706063440.54189-1-sandipan.das@xxxxxxx
> Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx
>
> > > Ian, I also stumbled on this:
>
> > > [root@five ~]# perf stat -M dram_channel_data_controller_4
> > > Cannot find metric or group `dram_channel_data_controller_4'
> > > ^C
> > > Performance counter stats for 'system wide':
>
> > > 284,908.91 msec cpu-clock # 32.002 CPUs utilized
> > > 6,485,456 context-switches # 22.763 K/sec
> > > 719 cpu-migrations # 2.524 /sec
> > > 32,800 page-faults # 115.125 /sec
>
> <SNIP>
>
> > > I.e. -M should bail out at that point (Cannot find metric or group `dram_channel_data_controller_4'), no?
>
> > We could. I suspect the code has always just not bailed out. I'll put
> > together a patch adding the bail out.
>
> Great, thanks,

Sent:
https://lore.kernel.org/lkml/20231206183533.972028-1-irogers@xxxxxxxxxx/

Thanks,
Ian

> - Arnaldo