Re: [PATCH] perf record: add a shortcut for metrics

From: Arnaldo Carvalho de Melo
Date: Tue May 28 2024 - 10:47:26 EST


On Tue, May 28, 2024 at 01:45:25PM +0200, Artem Savkov wrote:
> On Mon, May 27, 2024 at 02:28:29PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, May 27, 2024 at 02:04:54PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Mon, May 27, 2024 at 02:02:33PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Mon, May 27, 2024 at 12:15:19PM +0200, Artem Savkov wrote:
> > > > > Add -M/--metrics option to perf-record providing a shortcut to record
> > > > > metrics and metricgroups. This option mirrors the one in perf-stat.
> >
> > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> > > > > Signed-off-by: Artem Savkov <asavkov@xxxxxxxxxx>

<SNIP>

> > How did you test this?
> >
> > I'm trying:
> >
> > perf list metric
> >
> > pick a metric then:
> >
> > perf record -M tma_core_bound
> >
> > And it gets in a long loop doing perf_event_open() calls...
>
> [snip]
>
> > (gdb) bt
> > #0 0x00007ffff6f21804 in close () from /lib64/libc.so.6
> > #1 0x000000000061fbd2 in perf_evsel__close_fd_cpu (evsel=0xdab470, cpu_map_idx=6) at evsel.c:188
> > #2 0x000000000061fc22 in perf_evsel__close_fd (evsel=0xdab470) at evsel.c:197
> > #3 0x000000000061fc9b in perf_evsel__close (evsel=0xdab470) at evsel.c:211
> > #4 0x00000000004e0b5f in evlist.reset_weak_group ()
> > #5 0x0000000000423bb9 in __cmd_record.constprop.0 ()
> > #6 0x00000000004276c5 in cmd_record ()
> > #7 0x00000000004c4579 in run_builtin ()
> > #8 0x00000000004c4889 in handle_internal_command ()
> > #9 0x0000000000410e57 in main ()
> > (gdb) c
> > Continuing.
> > ^C
> > Program received signal SIGINT, Interrupt.
> > 0x00007ffff6f21804 in close () from /lib64/libc.so.6
> > (gdb)
> >
> > So you should investigate this further.
>
> I tried a bunch of random metrics from perf list but didn't hit this.
>
> It spins forever in evlist__for_each_entry() loop in record__open() with
> the same error:
>
> Weak group for TOPDOWN.SLOTS/5 failed
>
> Looks like the culprit is one of those unsupported metrics, will
> investigate.

Right, when trying something new, in a different way than the
pre-existing codebase was envisioned to be used we may uncover latent
problems, that endless loop seems like something we want fixed :-)

> > The idea, from my notes, was to be able to have extra columns in 'perf
> > report' with things like IPC and other metrics, probably not all metrics
> > will apply. We need to find a way to find out which ones are OK for that
> > purpose, for instance:
> >
> > Opening: cpu_core/topdown-bad-spec/
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 4 (cpu_core)
> > size 136
> > config 0x8100 (topdown-bad-spec)
> > { sample_period, sample_freq } 4000
> > sample_type IP|TID|TIME|CPU|PERIOD|IDENTIFIER
> > read_format ID|LOST
> > disabled 1
> > inherit 1
> > freq 1
> > sample_id_all 1
> > exclude_guest 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off PERF_FORMAT_LOST support
> > Opening: cpu_core/topdown-bad-spec/
>
> Is it just metrics containing unsupported events that need to be skipped
> or there are other cases that wouldn't make much sense? If the latter
> maybe it will be easier to just tag the ones that are supported (or not) in
> pmu-events?

Maybe we can use some criteria to look at the metric and filter out
things that are not working right now? As you go on studying the
codebase you will figure out the reasons, sometimes its a bug (the
forever loop above), sometimes it plain don't make sense and we just
skip it, leaving things like IPC, i.e. we have instructions, we have
cycles, that is what needed for IPC, ok, that makes sense and we should
have an IPC column when collecting both cycles and instructions, just
like is done in a ad hoc way for IPC in perf stat since forever.

People want to have those columns in 'perf report' and 'perf top'.

- Arnaldo