Re: [PATCH v1 2/2] perf script: Fix for `perf script +F metric` with leader sampling

From: James Clark
Date: Tue Jul 23 2024 - 11:27:04 EST




On 23/07/2024 3:57 pm, Ian Rogers wrote:
On Tue, Jul 23, 2024 at 7:41 AM James Clark <james.clark@xxxxxxxxxx> wrote:



On 20/07/2024 8:45 am, Ian Rogers wrote:
Andi Kleen reported a regression where `perf script +F metric` would
crash. With this change the output is:

```
$ perf record -a -e '{cycles,instructions}:S' perf bench mem memcpy

21.229620 GB/sec

15.751008 GB/sec

16.009221 GB/sec
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.945 MB perf.data (294 samples) ]
$ perf --no-pager script -F +metric
perf 1912464 [000] 814503.473101: 6325 cycles: ffffffff8548d64a native_write_msr+0xa ([kernel.kallsyms])
perf 1912464 [000] 814503.473101: metric: 0.06 insn per cycle
perf 1912464 [000] 814503.473101: 351 instructions: ffffffff8548d64a native_write_msr+0xa ([kernel.kallsyms])
perf 1912464 [000] 814503.473101: metric: 0.03 insn per cycle
...
```

For some reason I only get the metric: lines when I record with -a. I
noticed this because Andi's test doesn't use -a so it fails.

I'm not sure if that's expected or it's related to your disclaimer below?

It is. When you don't do -a the cpu map just contains -1 and for some
reason it is busted. The whole indirections to arrays of arrays,
counts, stats, aggregations, with indices into various other arrays
and a lack of helpers. The code works for perf stat, but there is a
lot of complexity that I don't fully grok in that. Here I've tried to
kind of break down what the code is trying to do in the comments, but
the old code never did sample_read_group__for_each so was is broken
with leader sampling? Is the leader sampling pretending the read
counts are periods and calling process sample multiple times. Andi
likely knows this code better than me so I was hoping he could fix it
up. We may want to take the patches anyway in order to not have a
segv.

Thanks,
Ian


Yeah I suppose it's strictly better now without the segfault. Could you pull in the test and update it to add -a? At least then that behavior will be locked down and we can extend it later without -a.

I also tested Andi's V5 and still got the segfault.


The change fixes perf script to update counts and thereby aggregate
values which then get consumed by unchanged metric logic in the shadow
stat output. Note, it would be preferential to switch to json metrics.

Reported-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Closes: https://lore.kernel.org/linux-perf-users/20240713155443.1665378-1-ak@xxxxxxxxxxxxxxx/
Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather than saved_value")'
Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
---
The code isn't well tested nor does it support non-leader sampling
reading of counts based on periods that seemed to present in the
previous code. Sending out for the sake of discussion. Andi's changes
added a test and that should certainly be added.
---
tools/perf/builtin-script.c | 114 +++++++++++++++++++++++++++++-------
1 file changed, 93 insertions(+), 21 deletions(-)