RE: [RFC PATCH v8 6/7] perf vendor events intel: Add MTL metric json files

From: Wang, Weilin
Date: Thu May 16 2024 - 13:44:14 EST




> -----Original Message-----
> From: Ian Rogers <irogers@xxxxxxxxxx>
> Sent: Thursday, May 16, 2024 9:57 AM
> To: Wang, Weilin <weilin.wang@xxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>; Arnaldo Carvalho de Melo
> <acme@xxxxxxxxxx>; Peter Zijlstra <peterz@xxxxxxxxxxxxx>; Ingo Molnar
> <mingo@xxxxxxxxxx>; Alexander Shishkin
> <alexander.shishkin@xxxxxxxxxxxxxxx>; Jiri Olsa <jolsa@xxxxxxxxxx>; Hunter,
> Adrian <adrian.hunter@xxxxxxxxx>; Kan Liang <kan.liang@xxxxxxxxxxxxxxx>;
> linux-perf-users@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Taylor, Perry
> <perry.taylor@xxxxxxxxx>; Alt, Samantha <samantha.alt@xxxxxxxxx>; Biggers,
> Caleb <caleb.biggers@xxxxxxxxx>
> Subject: Re: [RFC PATCH v8 6/7] perf vendor events intel: Add MTL metric json
> files
>
> On Tue, May 14, 2024 at 10:44 PM <weilin.wang@xxxxxxxxx> wrote:
> >
> > From: Weilin Wang <weilin.wang@xxxxxxxxx>
> >
> > Add MTL metric json file at TMA4.7 [1]. Some of the metrics' formulas use
> TPEBS
> > retire_latency in MTL.
> >
> > [1] https://lore.kernel.org/all/20240214011820.644458-1-
> irogers@xxxxxxxxxx/
> >
> > Signed-off-by: Weilin Wang <weilin.wang@xxxxxxxxx>
> > Reviewed-by: Ian Rogers <irogers@xxxxxxxxxx>
>
> This change works either with the approach in this series or with the
> evsel approach so I don't mind my reviewed-by standing. I'd prefer we
> could have an evsel read counter implementation that returns 0 so that
> we can run without retirement latency gathering.
>
> TMA 4.7 is broken in that the tma_lock_latency metric uses a
> retirement latency event but not within a max function so having the
> read counter return 0 would break the metric:
>
> + {
> + "BriefDescription": "This metric represents fraction of
> cycles the CPU spent handling cache misses due to lock operations",
> + "MetricExpr": "MEM_INST_RETIRED.LOCK_LOADS *
> MEM_INST_RETIRED.LOCK_LOADS:R / tma_info_thread_clks",
> + "MetricGroup":
> "Offcore;TopdownL4;tma_L4_group;tma_issueRFO;tma_l1_bound_group",
> + "MetricName": "tma_lock_latency",
> + "MetricThreshold": "tma_lock_latency > 0.2 & (tma_l1_bound >
> 0.1 & (tma_memory_bound > 0.2 & tma_backend_bound > 0.2))",
> + "PublicDescription": "This metric represents fraction of
> cycles the CPU spent handling cache misses due to lock operations. Due
> to the microarchitecture handling of locks; they are classified as
> L1_Bound regardless of what memory source satisfied them. Sample with:
> MEM_INST_RETIRED.LOCK_LOADS_PS. Related metrics: tma_store_latency",
> + "ScaleUnit": "100%",
> + "Unit": "cpu_core"
> + },
>
> Other metrics then use that metric specifically
> tma_info_bottleneck_memory_data_tlbs and
> tma_info_bottleneck_cache_memory_bandwidth.
>
> I couldn't see in the TMA 4.8 release the updated MTL metrics:
> https://github.com/intel/perfmon/pull/181/commits/d54c847b2f863c98a9
> 17bdd31a0680f4d50ff75c
> but my belief is that this issue hasn't been addressed.

I did not include TMA4.8 here because our release of TMA 4.8 is not finalized
by the time I sent this patch set. I will add TMA 4.8 and latest E-Core TMA
next time.

Thanks,
Weilin
>
> Thanks,
> Ian