Re: [PATCH v4 00/23] Intel vendor events and TMA 5.01 metrics

From: Ian Rogers
Date: Tue Feb 04 2025 - 23:58:33 EST


On Tue, Feb 4, 2025 at 8:28 PM Falcon, Thomas <thomas.falcon@xxxxxxxxx> wrote:
>
> On Tue, 2025-02-04 at 13:35 -0800, Ian Rogers wrote:
> > On Tue, Feb 4, 2025 at 1:33 PM Ian Rogers <irogers@xxxxxxxxxx> wrote:
> > >
> > > Update the Intel vendor events to the latest.
> > > Update the metrics to TMA 5.01.
> > > Add Arrowlake and Clearwaterforest support.
> > > Add metrics for LNL and GNR.
> > > Address IIO uncore issue spotted on EMR, GRR, GNR, SPR and SRF.
> > >
> > > The perf json was generated using the script:
> > > https://github.com/intel/perfmon/blob/main/scripts/create_perf_json.py
> > > with the generated json being in:
> > > https://github.com/intel/perfmon/tree/main/scripts/perf
> > >
> > > Thanks to Perry Taylor <perry.taylor@xxxxxxxxx>, Caleb Biggers
> > > <caleb.biggers@xxxxxxxxx>, Edward Baker <edward.baker@xxxxxxxxx>
> > > and
> > > Weilin Wang <weilin.wang@xxxxxxxxx> for helping get this patch
> > > series
> > > together.
> > >
> > > v4: Fix TSC events on hybrid mistakenly specifying the core PMU
> > > inhibiting the use of the msr PMU.
> > > v3: Fixes for hybrid metrics that were missing PMU. Update to the
> > > latest events.
> > > v2: Fix hybrid and Co-authored-by tag issues reported by
> > > Arnaldo. Updates to Lunarlake and Meteorlake events. Addition
> > > of
> > > Clearwaterforest.
> >
> > Sorry, forgot to add Thomas again.
> > https://lore.kernel.org/lkml/20250204213259.127939-1-irogers@xxxxxxxxxx/
>
> Hi, I'm seeing some warnings like this and the all metrics test is
> skipped:
>
> Testing tma_info_inst_mix_iparith
> FP issues
> Cannot resolve IDs for tma_info_inst_mix_iparith:
> cpu_core@INST_RETIRED.ANY@ / (cpu_core@FP_ARITH_INST_RETIRED.SCALAR@ +
> cpu_core@FP_ARITH_INST_RETIRED.VECTOR@)
> Testing tma_info_inst_mix_iparith_avx128
> FP issues
> Cannot resolve IDs for tma_info_inst_mix_iparith_avx128:
> cpu_core@INST_RETIRED.ANY@ /
> (cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_DOUBLE@ +
> cpu_core@FP_ARITH_INST_RETIRED.128B_PACKED_SINGLE@)
> Testing tma_info_inst_mix_iparith_avx256
> FP issues
> Cannot resolve IDs for tma_info_inst_mix_iparith_avx256:
> cpu_core@INST_RETIRED.ANY@ /
> (cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_DOUBLE@ +
> cpu_core@FP_ARITH_INST_RETIRED.256B_PACKED_SINGLE@)
> Testing tma_info_inst_mix_iparith_scalar_dp
> FP issues
> Cannot resolve IDs for tma_info_inst_mix_iparith_scalar_dp:
> cpu_core@INST_RETIRED.ANY@ /
> cpu_core@FP_ARITH_INST_RETIRED.SCALAR_DOUBLE@
> Testing tma_info_inst_mix_iparith_scalar_sp
> FP issues
> Cannot resolve IDs for tma_info_inst_mix_iparith_scalar_sp:
> cpu_core@INST_RETIRED.ANY@ /
> cpu_core@FP_ARITH_INST_RETIRED.SCALAR_SINGLE@

Thanks Tom, we've gone from a fail to skip - so progress! I think it
actually isn't something to worry about. These metrics are measuring
vector and floating point things. We run a workload, when testing the
metrics, that doesn't have floating point and vector operations. This
causes issues with metrics for these instructions as the counters
don't count anything. Because of this I added some logic to just skip
when we see these failures:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/stat_all_metrics.sh?h=perf-tools-next#n51
but a better fix would be to have a workload with FP and AMX operations.

You could test these metrics work manually, by running something like:
$ perf stat -M tma_info_inst_mix_iparith <benchmark>
where <benchmark> would need to contain FP or AMX instructions.

Thanks,
Ian