Re: [GIT PULL] perf tools changes for v6.10

From: Ian Rogers
Date: Fri May 24 2024 - 23:48:23 EST


On Fri, May 24, 2024 at 7:02 PM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> On Fri, May 24, 2024 at 10:55:11PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Fri, May 24, 2024 at 06:31:52PM -0700, Linus Torvalds wrote:
> > > On Tue, 21 May 2024 at 12:26, Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
> > > >
> > > > perf tools fixes and improvements for v6.10:
> > >
> > > This actually broke 'perf' completely for me on arm64.
> > >
> > > With a 6.9 version of 'perf', I can do this:
> > >
> > > perf record -e cycles:pp make -j199
> > >
> > > and it all works fine.
> > >
> > > With the current -git version, when I do the same, I instead get
> > >
> > > Error:
> > > cycles:pp: PMU Hardware doesn't support
> > > sampling/overflow-interrupts. Try 'perf stat'
> > >
> > > and after trying desperately to chase down what went wrong on the
> > > kernel side, I finally figured out that it wasn't a kernel change at
> > > all, it was the tooling that had changed.
> > >
> > > I did a 'git bisect', and it says
> > >
> > > 617824a7f0f73e4de325cf8add58e55b28c12493 is the first bad commit
> > > commit 617824a7f0f73e4de325cf8add58e55b28c12493
> > > Author: Ian Rogers <irogers@xxxxxxxxxx>
> > > Date: Mon Apr 15 23:15:25 2024 -0700
> > >
> > > perf parse-events: Prefer sysfs/JSON hardware events over legacy
> > >
> > > and very clearly this does *NOT* work at all for me.
> > >
> > > I didn't notice until now, simply because I had been busy with the
> > > merge window, so I hadn't been doing any profiles, but the merge
> > > window is calming down and the end is nigh, and I just wasted more
> > > time than I care to admit trying to figure out what went wrong in the
> > > kernel.
> > >
> > > And no, I don't speak JSON, and I have *no* idea what the legacy
> > > events are. Plus I'm not very familiar with the arm64 profiling etc
> > > anyway, so I'm just a clueless user here.
> > >
> > > I *can* confirm that just reverting that commit makes that trivial
> > > "perf record" work for me. So the bisect was good, and it reverts
> > > cleanly, but I don't know _why_ my arm64 setup hates it so much.

Thanks for the report. TL;DR try putting the PMU name with the event
name, so "cycles:pp" becomes "armv8_pmuv3_0/cycles/pp", where
armv8_pmuv3_0 is the name of the PMU from /sys/devices.

The perf tool has a number of inbuilt "legacy" event names, cycles is
one. Most events these days are either found in sysfs (this is on a
Raspberry Pi 5):
```
$ ls /sys/devices/armv8_cortex_a76/events/
br_mis_pred exc_return l1d_tlb_refill l2d_tlb
remote_access
br_mis_pred_retired exc_taken l1i_cache
l2d_tlb_refill stall_backend
br_pred inst_retired l1i_cache_refill l3d_cache
stall_frontend
br_retired inst_spec l1i_tlb
l3d_cache_allocate sw_incr
bus_access itlb_walk l1i_tlb_refill
l3d_cache_refill ttbr_write_retired
bus_cycles l1d_cache l2d_cache ll_cache_miss_rd
cid_write_retired l1d_cache_refill l2d_cache_allocate ll_cache_rd
cpu_cycles l1d_cache_wb l2d_cache_refill mem_access
dtlb_walk l1d_tlb l2d_cache_wb memory_error
```
or json that is turned into tables built into the perf tool.

ARM requested that sysfs and json events take priority over legacy
events. So 'armv8_cortex_a76/cycles/' should first try to open an
event in either sysfs or json, and if not present fallback on using
the legacy constants. Note that the event name has the PMU name first.
What should the behavior of the 'cycles' event with no PMU be? In
Linux 6.9 the behavior was that cycles without a PMU would be a legacy
encoding and only try to be on the core's PMU (generally cpu on Intel
or armv8... on ARM), but with a PMU it would prefer sysfs and json
tables.

RISC-V had asked that in the no PMU case they'd also like sysfs/json
to have priority so the driver could be more ignorant of event
encodings. It was also inconsistent that in Linux 6.9:

$ perf stat -e inst_retired.any true

was a sysfs/json event that we'd try to open on every PMU but:

$ perf stat -e instructions true

was a legacy event that would only be opened on the core PMU. (I'm
ignoring the complexity that BIG.little/hybrid adds).

The blamed patch does away with the inconsistency and makes it so that
legacy events are always the 2nd choice and we try to open every event
on every PMU. It is the 2nd point that I think is breaking you. I
think you have a PMU on your system with a cycles event, maybe an
uncore PMU with memory controller data or CXL, and the perf record is
trying to open the cycles event on that and the error message
correctly reports that sampling wouldn't be possible on that PMU.

Putting verbose flags on perf record:

$ perf record -vv ...

should hopefully give more breadcrumbs and confirm this. You could also do:

$ ls /sys/devices/*/events/cycles

If there are more than 1 sysfs cycles event then probably one of them
is the problem. Adding the PMU name removes the trying every PMU
behavior and so should be a fix. We could also change it so that when
we open multiple events for perf record we don't fail in a case like
this. Maybe it is a feature to fail.

Thanks,
Ian

> > That is a good data point, we probably could go with the revert, but I
> > think Ian submitted a few patches fixing this issue that came up close
> > to LSFMM/BPF and the merge window, so didn't have time to sit on
> > linux-next for a while, I'm looking those up now.
>
> Couldn't find it quickly, its late here, perhaps Ian can chime in and
> point those fixes here. I'll try and continue tomorrow.
>
> - Arnaldo
>
> > ARM64 eyes on this would also be good. Adding Mark Rutland and Leo Yan
> > to the CC list, maybe they can help us here with the best course of
> > action.