Re: [PATCH] perf record: add a shortcut for metrics

From: Ian Rogers
Date: Tue May 28 2024 - 01:01:59 EST


On Mon, May 27, 2024 at 10:46 AM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> On Mon, May 27, 2024 at 02:28:32PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, May 27, 2024 at 02:04:54PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Mon, May 27, 2024 at 02:02:33PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Mon, May 27, 2024 at 12:15:19PM +0200, Artem Savkov wrote:
> > > > > Add -M/--metrics option to perf-record providing a shortcut to record
> > > > > metrics and metricgroups. This option mirrors the one in perf-stat.
> >
> > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> > > > > Signed-off-by: Artem Savkov <asavkov@xxxxxxxxxx>
>
> > How did you test this?
>
> > The idea, from my notes, was to be able to have extra columns in 'perf
> > report' with things like IPC and other metrics, probably not all metrics
> > will apply. We need to find a way to find out which ones are OK for that
> > purpose, for instance:
>
> One that may make sense:
>
> root@number:~# perf record -M tma_fb_full
> ^C[ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.846 MB perf.data (21745 samples) ]
>
> root@number:~# perf evlist
> cpu_core/CPU_CLK_UNHALTED.THREAD/
> cpu_core/L1D_PEND_MISS.FB_FULL/
> dummy:u
> root@number:~#
>
> But then we need to read both to do the math, maybe something like:
>
> root@number:~# perf record -e '{cpu_core/CPU_CLK_UNHALTED.THREAD/,cpu_core/L1D_PEND_MISS.FB_FULL/}:S'
> ^C[ perf record: Woken up 40 times to write data ]
> [ perf record: Captured and wrote 14.640 MB perf.data (219990 samples) ]
>
> root@number:~# perf script | head
> cc1plus 1339704 [000] 36028.995981: 2011389 cpu_core/CPU_CLK_UNHALTED.THREAD/: 1097303 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
> cc1plus 1339704 [000] 36028.995981: 26231 cpu_core/L1D_PEND_MISSFB_FULL/: 1097303 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
> cc1plus 1340011 [001] 36028.996008: 2004568 cpu_core/CPU_CLK_UNHALTED.THREAD/: 8c23b4 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
> cc1plus 1340011 [001] 36028.996008: 20113 cpu_core/L1D_PEND_MISSFB_FULL/: 8c23b4 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
> clang 1340462 [002] 36028.996043: 2007356 cpu_core/CPU_CLK_UNHALTED.THREAD/: ffffffffb43b045d release_pages+0x3dd ([kernel.kallsyms])
> clang 1340462 [002] 36028.996043: 23481 cpu_core/L1D_PEND_MISSFB_FULL/: ffffffffb43b045d release_pages+0x3dd ([kernel.kallsyms])
> cc1plus 1339622 [003] 36028.996066: 2004148 cpu_core/CPU_CLK_UNHALTED.THREAD/: 760874 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
> cc1plus 1339622 [003] 36028.996066: 31935 cpu_core/L1D_PEND_MISSFB_FULL/: 760874 [unknown] (/usr/libexec/gcc/x86_64-pc-linux-gnu/13/cc1plus)
> as 1340513 [004] 36028.996097: 2005052 cpu_core/CPU_CLK_UNHALTED.THREAD/: ffffffffb4491d65 __count_memcg_events+0x55 ([kernel.kallsyms])
> as 1340513 [004] 36028.996097: 45084 cpu_core/L1D_PEND_MISSFB_FULL/: ffffffffb4491d65 __count_memcg_events+0x55 ([kernel.kallsyms])
> root@number:~#
>
> root@number:~# perf report --stdio -F +period | head -20
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 219K of events 'anon group { cpu_core/CPU_CLK_UNHALTED.THREAD/, cpu_core/L1D_PEND_MISS.FB_FULL/ }'
> # Event count (approx.): 216528524863
> #
> # Overhead Period Command Shared Object Symbol
> # ................ .................... ......... ................. ...................................
> #
> 4.01% 1.09% 8538169256 39826572 podman [kernel.kallsyms] [k] native_queued_spin_lock_slowpath
> 1.35% 1.17% 2863376078 42829266 cc1plus cc1plus [] 0x00000000003f6bcc
> 0.94% 0.78% 1990639149 28408591 cc1plus cc1plus [] 0x00000000003f6be4
> 0.65% 0.17% 1375916283 6109515 podman [kernel.kallsyms] [k] _raw_spin_lock_irqsave
> 0.61% 0.99% 1304418325 36198834 cc1plus [kernel.kallsyms] [k] get_mem_cgroup_from_mm
> 0.52% 0.42% 1103054030 15427418 cc1plus cc1plus [] 0x0000000000ca6c69
> 0.51% 0.17% 1094200572 6299289 podman [kernel.kallsyms] [k] psi_group_change
> 0.42% 0.41% 893633315 14778675 cc1plus cc1plus [] 0x00000000018afafe
> 0.42% 1.29% 887664793 47046952 cc1plus [kernel.kallsyms] [k] asm_exc_page_fault
> root@number:~#
>
> That 'tma_fb_full' metric then would be another column, calculated from
> the sampled components of its metric equation:
>
> root@number:~# perf list tma_fb_full | head
>
> Metric Groups:
>
> MemoryBW: [Grouping from Top-down Microarchitecture Analysis Metrics spreadsheet]
> tma_fb_full
> [This metric does a *rough estimation* of how often L1D Fill Buffer
> unavailability limited additional L1D miss memory access requests to
> proceed]
>
> TopdownL4: [Metrics for top-down breakdown at level 4]
> root@number:~#
>
> This is roughly what we brainstormed, to support metrics in other tools
> than 'perf stat' but we need to check the possibilities and limitations
> of such an idea, hopefully this discussion will help with that,

Putting metrics next to code in perf report/annotate sounds good to
me, opening all events from a metric as if we want to sample on them
less so. We don't have metrics working with `perf stat record`, I
think Kan may have volunteered for that, but it seems like something
more urgent than expanding `perf record`. Presumably the way the
metric would be recorded for that could also benefit this effort.

If you look at the tma metrics a number of them have a "Sample with".
For example:
```
$ perf list -v
..
tma_branch_mispredicts
[This metric represents fraction of slots the CPU has wasted
due to Branch Misprediction.
These slots are either wasted by uops fetched from an
incorrectly speculated program path;
or stalls when the out-of-order part of the machine needs to
recover its state from a
speculative path. Sample with: BR_MISP_RETIRED.ALL_BRANCHES.
Related metrics:
tma_info_bad_spec_branch_misprediction_cost,tma_info_bottleneck_mispredictions,
tma_mispredicts_resteers]
..
```
It could be logical for `perf record -M tma_branch_mispredicts ...` to
be translated to `perf record -e BR_MISP_RETIRED.ALL_BRANCHES ...`
rather than to do any form of counting.

Thanks,
Ian