Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
From: Ian Rogers
Date: Fri Jan 10 2025 - 11:42:29 EST
On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
<acme@xxxxxxxxxx> wrote:
>
> Adding Linus to the CC list as he participated in this discussion in the
> past, so a heads up about changes in this area that are being further
> discussed.
Linus blocks my email so I'm not sure of the point.
> On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > Whilst for many tools it is an expected behavior that failure to open
> > > a perf event is a failure, ARM decided to name PMU events the same as
> > > legacy events and then failed to rename such events on a server uncore
> > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > open the event on all PMUs that advertise/"have" the event, this
> > > yielded failures when trying to make the priority of legacy and
> > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > legacy event user on ARM hardware may find their event opened on an
> > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > such events which this patch implements. Rather than have the skipping
> > > conditional on running on ARM, the skipping is done on all
> > > architectures as such a fundamental behavioral difference could lead
> > > to problems with tools built/depending on perf.
> > >
> > > An example of perf record failing to open events on x86 is:
> > > ```
> > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > The LLC-prefetch-read event is not supported.
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> >
> > I'm afraid this can be too noisy.
>
> Agreed.
>
> > > $ perf report --stats
> > > Aggregated stats:
> > > TOTAL events: 17255
> > > MMAP events: 284 ( 1.6%)
> > > COMM events: 1961 (11.4%)
> > > EXIT events: 1 ( 0.0%)
> > > FORK events: 1960 (11.4%)
> > > SAMPLE events: 87 ( 0.5%)
> > > MMAP2 events: 12836 (74.4%)
> > > KSYMBOL events: 83 ( 0.5%)
> > > BPF_EVENT events: 36 ( 0.2%)
> > > FINISHED_ROUND events: 2 ( 0.0%)
> > > ID_INDEX events: 1 ( 0.0%)
> > > THREAD_MAP events: 1 ( 0.0%)
> > > CPU_MAP events: 1 ( 0.0%)
> > > TIME_CONV events: 1 ( 0.0%)
> > > FINISHED_INIT events: 1 ( 0.0%)
> > > cycles stats:
> > > SAMPLE events: 87
> > > ```
> > >
> > > If all events fail to open then the perf record will fail:
> > > ```
> > > $ perf record -e LLC-prefetch-read true
> > > Error:
> > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > The LLC-prefetch-read event is not supported.
> > > Error:
> > > Failure to open any events for recording
> > > ```
> > >
> > > As an evlist may have dummy events that open when all command line
> > > events fail we ignore dummy events when detecting if at least some
> > > events open. This still permits the dummy event on its own to be used
> > > as a permission check:
> > > ```
> > > $ perf record -e dummy true
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > ```
> > > but allows failure when a dummy event is implicilty inserted or when
> > > there are insufficient permissions to open it:
> > > ```
> > > $ perf record -e LLC-prefetch-read -a true
> > > Error:
> > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > The LLC-prefetch-read event is not supported.
> > > Error:
> > > Failure to open any events for recording
> > > ```
> > >
> > > The issue with legacy events is that on RISC-V they want the driver to
> > > not have mappings from legacy to non-legacy config encodings for each
> > > vendor/model due to size, complexity and difficulty to update. It was
> > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > was broken and the sysfs/json events should always take precedent,
> > > however, it isn't clear this is still the case. It is the case that
> > > without working around this issue a legacy event like cycles without a
> > > PMU can encode differently than when specified with a PMU - the
> > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > encodings.
> > >
> > > The patch removes events and then adjusts the idx value for each
> > > evsel. This is done so that the dense xyarrays used for file
> > > descriptors, etc. don't contain broken entries. As event opening
> > > happens relatively late in the record process, use of the idx value
> > > before the open will have become corrupted, so it is expected there
> > > are latent bugs hidden behind this change - the change is best
> > > effort. As the only vendor that has broken event names is ARM, this
> > > will principally effect ARM users. They will also experience warning
> > > messages like those above because of the uncore PMU advertising legacy
> > > event names.
> > >
> > > Suggested-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> > > Signed-off-by: Ian Rogers <irogers@xxxxxxxxxx>
> > > Tested-by: James Clark <james.clark@xxxxxxxxxx>
> > > Tested-by: Leo Yan <leo.yan@xxxxxxx>
> > > Tested-by: Atish Patra <atishp@xxxxxxxxxxxx>
> > > ---
> > > tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > > 1 file changed, 41 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > index 5db1aedf48df..c0b8249a3787 100644
> > > --- a/tools/perf/builtin-record.c
> > > +++ b/tools/perf/builtin-record.c
> > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > > */
> > > if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > > perf_pmus__num_core_pmus() > 1) {
> > > -
> > > /*
> > > * User space tasks can migrate between CPUs, so when tracing
> > > * selected CPUs, sideband for all CPUs is still needed.
> > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > > struct perf_session *session = rec->session;
> > > struct record_opts *opts = &rec->opts;
> > > int rc = 0;
> > > + bool skipped = false;
> > >
> > > evlist__for_each_entry(evlist, pos) {
> > > try_again:
> > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > > pos = evlist__reset_weak_group(evlist, pos, true);
> > > goto try_again;
> > > }
> > > - rc = -errno;
> > > evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > - ui__error("%s\n", msg);
> > > - goto out;
> > > + ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > + evsel__name(pos), evsel__pmu_name(pos), msg);
>
> > How about changing it to pr_debug() and add below ...
>
> That sounds better.
>
> > > + pos->skippable = true;
> > > + skipped = true;
> > > + } else {
> > > + pos->supported = true;
> > > }
> > > -
> > > - pos->supported = true;
> > > }
> > >
> > > + if (skipped) {
> > > + struct evsel *tmp;
> > > + int idx = 0;
> > > + bool evlist_empty = true;
> > > +
> > > + /* Remove evsels that failed to open and update indices. */
> > > + evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > + if (pos->skippable) {
> > > + evlist__remove(evlist, pos);
> > > + continue;
> > > + }
> > > +
> > > + /*
> > > + * Note, dummy events may be command line parsed or
> > > + * added by the tool. We care about supporting `perf
> > > + * record -e dummy` which may be used as a permission
> > > + * check. Dummy events that are added to the command
> > > + * line and opened along with other events that fail,
> > > + * will still fail as if the dummy events were tool
> > > + * added events for the sake of code simplicity.
> > > + */
> > > + if (!evsel__is_dummy_event(pos))
> > > + evlist_empty = false;
> > > + }
> > > + evlist__for_each_entry(evlist, pos) {
> > > + pos->core.idx = idx++;
> > > + }
> > > + /* If list is empty then fail. */
> > > + if (evlist_empty) {
> > > + ui__error("Failure to open any events for recording.\n");
> > > + rc = -1;
> > > + goto out;
> > > + }
>
> > ... ?
>
> > if (!verbose)
> > ui__warning("Removed some unsupported events, use -v for details.\n");
>
> And even this one would be best left for cases where we can determine
> that its a new situation, i.e. one that should work and not the ones we
> know that will not work already and thus so far didn't alarm the user
> into thinking something is wrong.
>
> Having the ones we know will fail as pr_debug() seems enough, I'd say.
This means that:
```
$ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
```
will fail (as data_read is a memory controller event and the LLC
doesn't support sampling) with something like:
```
Error:
Failure to open any events for recording
```
Which feels a bit minimal. As I already mentioned, it is also a
behavior change and so has the potential to break scripts dependent on
the failure information.
A patch lowering the priority of error messages should be independent
of the 4 changes here. I'd be happy if someone follows this series
with a patch doing it.
Thanks,
Ian