Re: [PATCH v1] perf evlist: Force adding default events only to core PMUs
From: Ian Rogers
Date: Tue May 28 2024 - 01:37:08 EST
On Mon, May 27, 2024 at 3:58 AM Leo Yan <leo.yan@xxxxxxxxx> wrote:
>
> On Sat, May 25, 2024 at 02:14:26PM -0700, Linus Torvalds wrote:
> > On Sat, 25 May 2024 at 09:43, Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > This makes 'perf record' work for me again.
> >
> > Oh, wait, no it doesn't.
> >
> > It makes just the plain "perf record" without any arguments work,
> > which was what I was testing because I was lazy.
> >
> > So now
> >
> > $ perf record sleep 1
> >
> > works fine. But
> >
> > $ perf record -e cycles:pp sleep 1
> >
> > is still completely broken (with or without ":p" and ":pp").
>
> Seems to me that this patch fails to check if a PMU is a core-attached
> PMU that can support common hardware events. Therefore, we should
> consider adding the following check.
>
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 30f958069076..bc1822c2f3e3 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1594,6 +1594,9 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
> while ((pmu = perf_pmus__scan(pmu)) != NULL) {
> bool auto_merge_stats;
>
> + if (hw_config != PERF_COUNT_HW_MAX && !pmu->is_core)
> + continue;
> +
> if (parse_events__filter_pmu(parse_state, pmu))
> continue;
>
> To be clear, I only compiled this change but I have no chance to test
> it. @Ian, could you confirm this?
Hi Leo,
so the code is working as intended. I believe it also agrees with what
Arnaldo thinks.
If you do:
$ perf stat -e cycles ...
and you have
/sys/devices/pmu1/events/cycles
/sys/devices/pmu2/events/cycles
The output of perf stat should contain counts for pmu1 and pmu2. Were
the event 'data_read' or 'inst_retired.any' we wouldn't be having the
question as this has always been perf's behavior - changing that
behavior would be a regression and why we can't just limit wildcard
searches to core PMUs.
The issue is about legacy events. There are more potential legacy
names than those in 'enum perf_hw_id' as there are all the hardware
cache events. So ignoring "hw_config != PERF_COUNT_HW_MAX" when adding
an event to >1 PMU is only a partial solution, but I (and I think
others agree) don't think legacy names should be special in this way.
If you ask for an event and multiple PMUs advertise it then the
behavior should be to count the event.
Linus' trouble stems from cycles being used as a default value without
restricting it to core PMUs. This patch solves that. There is then the
issue that if an event that doesn't support sampling is wild card
matched by `perf record` then it should be ignored. I'm in less
agreement here, such events failing could be regarded as a feature.
The workaround is to add the PMU name when specifying the event. If we
do allow such events to be skipped, should the skipping be accompanied
by a warning? Presumably if no events can be opened then things should
terminate. Restricting everything to the core PMU I think is
short-sighted as knowing what's happening on accelerators is
increasingly important.
For the sake of fixing TMA metrics and things like default attributes
for ARM I added a notion that evsels in perf stat can be skippable.
The plumbing for this with perf record is more tricky as the logic has
half migrated into libperf - parsing and skippable are in perf, the
event opening and mmap logic is all in libperf, and we need to add
ignoring skipped evsels when adding the events to the mmap-ed ring
buffer. I have a patch that's WIP for this, but I also think we could
also agree that when >1 PMU advertises an event, perf's behavior when
matching should be to open all such events. You avoid this by
specifying a PMU name.
Thanks,
Ian
> Thanks,
> Leo
>
>
> > So no. That still needs to be fixed, or the whole "prefer sysfs/JSON
> > by default" needs to be reverted.
>
>