Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
From: Ian Rogers
Date: Wed Jan 15 2025 - 17:40:59 EST
On Wed, Jan 15, 2025 at 2:14 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Tue, Jan 14, 2025 at 03:55:47PM -0800, Ian Rogers wrote:
> > On Tue, Jan 14, 2025 at 11:29 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > >
> > > On Fri, Jan 10, 2025 at 11:18:53AM -0800, Ian Rogers wrote:
> > > > On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > >
> > > > > On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > > > > > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > > > to problems with tools built/depending on perf.
> > > > > > > >
> > > > > > > > An example of perf record failing to open events on x86 is:
> > > > > > > > ```
> > > > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > > > Error:
> > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > >
> > > > > > > > Error:
> > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > >
> > > > > > > > Error:
> > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > > > >
> > > > > > > I'm afraid this can be too noisy.
> > > > > >
> > > > > > The intention is to be noisy:
> > > > > > 1) it matches the existing behavior, anything else is potentially a regression;
> > > > >
> > > > > Well.. I think you're changing the behavior. :) Also currently it just
> > > > > fails on the first event so it won't be too much noisy.
> > > > >
> > > > > $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
> > > > > event syntax error: 'data_read,data_write,LLC-prefetch-read'
> > > > > \___ Bad event name
> > > > >
> > > > > Unable to find event on a PMU of 'data_read'
> > > > > Run 'perf list' for a list of valid events
> > > > >
> > > > > Usage: perf record [<options>] [<command>]
> > > > > or: perf record [<options>] -- <command> [<options>]
> > > > >
> > > > > -e, --event <event> event selector. use 'perf list' to list available events
> > > >
> > > > Fwiw, this error is an event parsing error not an event opening error.
> > > > You need to select an uncore event, I was using data_read which exists
> > > > in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
> > > > existing error message:
> > > > ```
> > > > $ perf record -e data_read -a true
> > > > Error:
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > > for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > > ```
> > > > and here it with the series:
> > > > ```
> > > > $ perf record -e data_read -a true
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> > > > which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > > for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> > > > which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > > for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open any events for recording.
> > > > ```
> > > > and here is what it would be with pr_debug:
> > > > ```
> > > > $ perf record -e data_read -a true
> > > > Error:
> > > > Failure to open any events for recording.
> > > > ```
> > > > I believe this last output is worst because:
> > > > 1) If not all events fail to open there is no error reported unless I
> > > > know to run with -v, which will also bring a bunch more noise with it,
> > >
> > > I suggested to add a warning if any (not all) of events failed to open.
> > >
> > > "Removed some unsupported events, use -v for details."
> > >
> > >
> > > > 2) I don't see the PMU / event name and "Invalid argument" indicating
> > > > what has gone wrong again unless I know to run with -v and get all the
> > > > verbose noise with that.
> > >
> > > I don't think single -v adds a lot of noise in the output.
> > >
> > > >
> > > > Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
> > > > bug that ARM should have long ago fixed. That should be fixed rather
> > > > than hiding errors and making users think they are recording samples
> > > > when silently they're not - or they need to search through verbose
> > > > output to try to find out if something broke.
> > >
> > > I'm not sure if it's a bug in the driver. It happens because perf tool
> > > changed the way it finds events - it used to look at the core PMUs only
> > > if no PMU name was given, but now it searches every PMU, right?
> >
> > So there is the ARM bug in the PMU driver that caused an issue with
> > the hybrid fixes done because of wanting to have metrics work for
> > hybrid. The bug is reported here:
> > https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@xxxxxxxxx/
>
> I'm not sure if it's agreed to be called a PMU bug.
> My understanding is it's the change in the perf tool that break it.
>
>
> > The events are apple_icestorm_pmu/cycles/ and
> > apple_firestorm_pmu/cycles/. The issue is that prior to fixing hybrid
> > the ARM PMUs looked like uncore PMUs and couldn't open a legacy event,
> > which was fine as they has sysfs events. When hybrid was fixed in the
> > tool, the tool would then try to open apple_icestorm_pmu/cycles/ and
> > apple_firestorm_pmu/cycles/ as legacy events - legacy having priority
> > over sysfs/json back then. The legacy mapping was broken in the PMU
>
> I don't know why you want to use legacy events (PERF_TYPE_HARDWARE)
> when it has PMU in the event name and the PMU has a different type
> enconding.
Historically we used legacy then sysfs/json. When Intel did the hybrid
work they kept this priority for legacy events when the hybrid PMU was
specified. Intel change, Arnaldo and Jiri reviewing. My belief in why
it was done this way is that not every legacy event has a sysfs/json
encoding, and the priority was just inheriting an existing behavior.
> > driver. Now were everything working as intended, just the cycles event
> > would be specified on the command line and the event would be wildcard
> > opened on the apple_icestorm_pmu and apple_firestorm_pmu. I believe
> > this way would already use a legacy encoding and so to work around the
> > PMU driver bug people were specifying the PMU name to get the sysfs
> > encoding, but that only worked as the PMUs appeared to be uncore.
> >
> > > >
> > > > > > 2) it only happens if trying to record on a PMU/event that doesn't
> > > > > > support recording, something that is currently an error and so we're
> > > > > > not motivated to change the behavior as no-one should be using it;
> > > > >
> > > > > It was caught by Linus, so we know at least one (very important) user.
> > > >
> > > > If they care enough then specifying the PMU with the event will avoid
> > > > any warning and has always been a fix for this issue. It was the first
> > > > proposed workaround for Linus.
> > >
> > > I guess that's what Linus said regression.
> >
> > But a regression where? The tool's behavior is pretty clear, no PMU
> > the event will be tried on every PMU, give it a PMU and the event will
> > only be tried on that PMU, give it a PMU without a suffix and the
> > event will be opened on all PMUs that match the name with different
> > suffixes.
>
> It may be clear to you but may not be to others. When did the change
> come in? Before the change, people assume it would only try core PMU.
> And the people can still have the idea if they haven't used any affected
> events. I guess many users would use legacy events only.
Who are these others? The only affected event is a cycles event on an
ARM SLC PMU. What gets fixed? Potentially Apple-M hardware where
legacy events have historically been broken. Rather than others
complaining I think there's a much greater number of others who will
be happy about the fix.
> > I dislike the idea of cpu-cycles implicitly being just for
> > core PMUs, but cpu_cycles being for all PMUs as the hyphen is a legacy
> > name and the underscore not.
>
> That's because we specifically picked some names to be used as a legacy
> event. And it worked well. If some PMU didn't use the name, it's their
> fault and they should use PMU event with their name.
I'm not sure this is the case. If you look at the legacy event names
it was typical they included something like a PMU before the first
hyphen. What does cpu- mean for hybrid? Why does LLC mean L2 when
typical LLCs these days are L3? I think ideally we'd delete the legacy
events and fix the missing events by explicitly putting them into
sysfs/json. I don't see that happening soon.
> > I dislike the idea of specifying a PMU
> > with uncore events as uncore events often already have a PMU within
> > their event name and it also breaks the universe.
>
> Does the 'universe' mean 'metric'?
No, I mean if I do:
$ perf stat -e data_read ...
it works today. Making all non-core events require a PMU means I need to type:
$ perf stat -e uncore_imc_free_running/data_read/ ...
This is true for all non-core events with your proposal. I've
previously advised you that the former behavior is what perf's command
line completion of event names assumes.
> Having PMU name in the event name is their choice. Do you see this in
> sysfs or JSON? Or both?
>
> Actually I don't like the idea of trying every PMU if no PMU name is
> given. But you said reverting it would break metrics (I don't know if
> there are other users rely on this behavior). Maybe can we handle
> metrics differently?
>
> I guess we can put JSON events and metrics without PMU in a global name
> space so that it can be searched (after legacy name) when users don't
> specify PMUs in the command line. Otherwise it should have PMU name
> and sysfs event (then JSON events with PMU name) can be searched.
>
> Does that make sense?
So my intention for metrics is that the events there work as events
would work for perf stat. Not least this simplifies testing and
creating metrics.
There are things we can do with the search order of events, I'd like
to make it so users can create their own events, but this is getting
off topic.
> > When trying to find
> > out what people mean by event names being implicitly associated with
> > PMUs I get told I'm throwing out "what ifs," when all I'm doing is
> > reading the code (that I wrote and I'm trying to fix) and trying to
> > figure out what behavior people want. What I don't want is
> > inconsistencies, events behaving differently in different scenarios
> > and the perf output's use of event names being inconsistent with the
> > parsing. RISC-V and ARM have wanted the syfs/json over legacy
> > priority, so I'm trying to get that landed.
>
> I'm not sure now RISC-V and ARM want it. Or it needs to be more
> specific what they want exactly.
In which case let's make the priority be legacy then sysfs/json, I'm
happy with that. We can revert the changes that Mark Rutland and the
Apple-M folks pushed for. We can tell RISC-V they're not being
specific enough with their need. I don't think that's as good an
alternative as the changes here, but if it works for you...
> >
> > Ultimately the original regression comes back to the ARM SLC PMU
> > advertising a cycles event when it should have been named cpu_cycles,
> > if for no other reason than uniformity with the bus_cycles name on the
> > same PMU. The change in perf's wildcard behavior exposed the latent
> > bug, that doesn't make the SLC PMU's event name not a bug. The change
> > here is to make seeing that bug non-terminal to running the program.
>
> I don't see it's a bug if uncore PMUs have an event named 'cycles' or
> whatever. It's just because perf record wanted to use it and that's
> entirely tool's choice.
It's a bug because a wildcard match */cycles/ will match against the
SLC PMU's event and if users don't want to specify a PMU it breaks
perf record when wildcarding cycles. These changes lower the failure
to a warning, implementing the behavior proposed by Arnaldo:
https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
in Google's tree. If you want to require uncore events have PMUs in
their names, maintain all the metrics, etc. I don't plan on carrying
that but you have complete freedom to do what you see best.
Thanks,
Ian