Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
From: Namhyung Kim
Date: Wed Jan 15 2025 - 17:15:00 EST
On Tue, Jan 14, 2025 at 03:55:47PM -0800, Ian Rogers wrote:
> On Tue, Jan 14, 2025 at 11:29 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> >
> > On Fri, Jan 10, 2025 at 11:18:53AM -0800, Ian Rogers wrote:
> > > On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > >
> > > > On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > > > > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > > to problems with tools built/depending on perf.
> > > > > > >
> > > > > > > An example of perf record failing to open events on x86 is:
> > > > > > > ```
> > > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > > Error:
> > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > >
> > > > > > > Error:
> > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > >
> > > > > > > Error:
> > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > > >
> > > > > > I'm afraid this can be too noisy.
> > > > >
> > > > > The intention is to be noisy:
> > > > > 1) it matches the existing behavior, anything else is potentially a regression;
> > > >
> > > > Well.. I think you're changing the behavior. :) Also currently it just
> > > > fails on the first event so it won't be too much noisy.
> > > >
> > > > $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
> > > > event syntax error: 'data_read,data_write,LLC-prefetch-read'
> > > > \___ Bad event name
> > > >
> > > > Unable to find event on a PMU of 'data_read'
> > > > Run 'perf list' for a list of valid events
> > > >
> > > > Usage: perf record [<options>] [<command>]
> > > > or: perf record [<options>] -- <command> [<options>]
> > > >
> > > > -e, --event <event> event selector. use 'perf list' to list available events
> > >
> > > Fwiw, this error is an event parsing error not an event opening error.
> > > You need to select an uncore event, I was using data_read which exists
> > > in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
> > > existing error message:
> > > ```
> > > $ perf record -e data_read -a true
> > > Error:
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > > ```
> > > and here it with the series:
> > > ```
> > > $ perf record -e data_read -a true
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> > > which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> > > which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open any events for recording.
> > > ```
> > > and here is what it would be with pr_debug:
> > > ```
> > > $ perf record -e data_read -a true
> > > Error:
> > > Failure to open any events for recording.
> > > ```
> > > I believe this last output is worst because:
> > > 1) If not all events fail to open there is no error reported unless I
> > > know to run with -v, which will also bring a bunch more noise with it,
> >
> > I suggested to add a warning if any (not all) of events failed to open.
> >
> > "Removed some unsupported events, use -v for details."
> >
> >
> > > 2) I don't see the PMU / event name and "Invalid argument" indicating
> > > what has gone wrong again unless I know to run with -v and get all the
> > > verbose noise with that.
> >
> > I don't think single -v adds a lot of noise in the output.
> >
> > >
> > > Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
> > > bug that ARM should have long ago fixed. That should be fixed rather
> > > than hiding errors and making users think they are recording samples
> > > when silently they're not - or they need to search through verbose
> > > output to try to find out if something broke.
> >
> > I'm not sure if it's a bug in the driver. It happens because perf tool
> > changed the way it finds events - it used to look at the core PMUs only
> > if no PMU name was given, but now it searches every PMU, right?
>
> So there is the ARM bug in the PMU driver that caused an issue with
> the hybrid fixes done because of wanting to have metrics work for
> hybrid. The bug is reported here:
> https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@xxxxxxxxx/
I'm not sure if it's agreed to be called a PMU bug.
My understanding is it's the change in the perf tool that break it.
> The events are apple_icestorm_pmu/cycles/ and
> apple_firestorm_pmu/cycles/. The issue is that prior to fixing hybrid
> the ARM PMUs looked like uncore PMUs and couldn't open a legacy event,
> which was fine as they has sysfs events. When hybrid was fixed in the
> tool, the tool would then try to open apple_icestorm_pmu/cycles/ and
> apple_firestorm_pmu/cycles/ as legacy events - legacy having priority
> over sysfs/json back then. The legacy mapping was broken in the PMU
I don't know why you want to use legacy events (PERF_TYPE_HARDWARE)
when it has PMU in the event name and the PMU has a different type
enconding.
> driver. Now were everything working as intended, just the cycles event
> would be specified on the command line and the event would be wildcard
> opened on the apple_icestorm_pmu and apple_firestorm_pmu. I believe
> this way would already use a legacy encoding and so to work around the
> PMU driver bug people were specifying the PMU name to get the sysfs
> encoding, but that only worked as the PMUs appeared to be uncore.
>
> > >
> > > > > 2) it only happens if trying to record on a PMU/event that doesn't
> > > > > support recording, something that is currently an error and so we're
> > > > > not motivated to change the behavior as no-one should be using it;
> > > >
> > > > It was caught by Linus, so we know at least one (very important) user.
> > >
> > > If they care enough then specifying the PMU with the event will avoid
> > > any warning and has always been a fix for this issue. It was the first
> > > proposed workaround for Linus.
> >
> > I guess that's what Linus said regression.
>
> But a regression where? The tool's behavior is pretty clear, no PMU
> the event will be tried on every PMU, give it a PMU and the event will
> only be tried on that PMU, give it a PMU without a suffix and the
> event will be opened on all PMUs that match the name with different
> suffixes.
It may be clear to you but may not be to others. When did the change
come in? Before the change, people assume it would only try core PMU.
And the people can still have the idea if they haven't used any affected
events. I guess many users would use legacy events only.
> I dislike the idea of cpu-cycles implicitly being just for
> core PMUs, but cpu_cycles being for all PMUs as the hyphen is a legacy
> name and the underscore not.
That's because we specifically picked some names to be used as a legacy
event. And it worked well. If some PMU didn't use the name, it's their
fault and they should use PMU event with their name.
> I dislike the idea of specifying a PMU
> with uncore events as uncore events often already have a PMU within
> their event name and it also breaks the universe.
Does the 'universe' mean 'metric'?
Having PMU name in the event name is their choice. Do you see this in
sysfs or JSON? Or both?
Actually I don't like the idea of trying every PMU if no PMU name is
given. But you said reverting it would break metrics (I don't know if
there are other users rely on this behavior). Maybe can we handle
metrics differently?
I guess we can put JSON events and metrics without PMU in a global name
space so that it can be searched (after legacy name) when users don't
specify PMUs in the command line. Otherwise it should have PMU name
and sysfs event (then JSON events with PMU name) can be searched.
Does that make sense?
> When trying to find
> out what people mean by event names being implicitly associated with
> PMUs I get told I'm throwing out "what ifs," when all I'm doing is
> reading the code (that I wrote and I'm trying to fix) and trying to
> figure out what behavior people want. What I don't want is
> inconsistencies, events behaving differently in different scenarios
> and the perf output's use of event names being inconsistent with the
> parsing. RISC-V and ARM have wanted the syfs/json over legacy
> priority, so I'm trying to get that landed.
I'm not sure now RISC-V and ARM want it. Or it needs to be more
specific what they want exactly.
>
> Ultimately the original regression comes back to the ARM SLC PMU
> advertising a cycles event when it should have been named cpu_cycles,
> if for no other reason than uniformity with the bus_cycles name on the
> same PMU. The change in perf's wildcard behavior exposed the latent
> bug, that doesn't make the SLC PMU's event name not a bug. The change
> here is to make seeing that bug non-terminal to running the program.
I don't see it's a bug if uncore PMUs have an event named 'cycles' or
whatever. It's just because perf record wanted to use it and that's
entirely tool's choice.
Thanks,
Namhyung
>
> > >
> > > > > 3) for the wildcard case the only offender is ARM's SLC PMU and the
> > > > > appropriate fix there has always been to make the CPU cycle's event
> > > > > name match the bus_cycles event name by calling it cpu_cycles -
> > > > > something that doesn't conflict with a core PMU event name, the thing
> > > > > that has introduced all these problems, patches, long email exchanges,
> > > > > unfixed inconsistencies, etc.. If the errors aren't noisy then there
> > > > > is little motivation for the ARM SLC PMU's event name to be fixed.
> > > >
> > > > I understand your concern but I'm not sure it's the best way to fix the
> > > > issue.
> > >
> > > Right, I'm similarly concerned about hiding legitimate warning/error
> > > messages because of 1 event on 1 PMU on 1 architecture because of how
> > > perf gets driven by 1 user. Yes, when you break you can wade through
> > > the verbose output but imo the verbose output was never intended to be
> > > used in that way.
> >
> > Well, the verbose output is to debug when something doesn't go well, no?
>
> The output isn't currently only enabled in verbose mode, so is this
> wrong? You will only get extra warnings with this change if you do
> anything wrong. For a hybrid system maybe you've gone from 1 warning
> to 2, I fail to see a big deal. Yes if you try to do perf record on an
> uncore server PMU with many instances you will potentially get many
> warnings, but the behavior before and after is to fail and the user is
> likely to figure out what the fix is in both cases, with more errors
> they may appreciate better that the event was getting opened on many
> PMUs. The trend for event parsing errors is to have more error
> messages. We went from 1 to 2 in commit
> a910e4666d61712840c78de33cc7f89de8affa78 and from 2 to many in commit
> fd7b8e8fb20f51d60dfee7792806548f3c6a4c2c. The trend isn't to try to
> move things into verbose only output and for things to silently (or
> with little detail) fail for the user.
>
> Thanks,
> Ian