Re: [RFC] perf tools: About encodings of legacy event names

From: Ian Rogers
Date: Fri Mar 07 2025 - 13:48:22 EST


On Fri, Mar 7, 2025 at 7:10 AM Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:
>
> On Fri, Mar 07, 2025 at 02:17:22PM +0000, James Clark wrote:
> > On 24/02/2025 3:01 pm, Arnaldo Carvalho de Melo wrote:
> > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote:
> > > > I knew of this tech debt and separately RISC-V was also interested to
> > > > have sysfs/json be the priority so that the legacy to config encoding
> > > > could exist more in the perf tool than the PMU driver. I'm a SIG
>
> > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as
> > > they didn't want to break the perf tooling workflow, no?
>
> > Doesn't most of the discussion stem from this particular point? I also
> > understood it that way, that risc-v folks agreed it was better to support
> > these to make all existing software work, not just Perf.
>
> That is my understanding, and I agree with them and with you.

This is describing what RISC-V have been forced into doing:
1) to support non-perf tooling,
2) because the perf is inconsistent in priority with legacy and
sysfs/json events.

Their preference has been to move these problems into the tool not the
PMU driver. What you are saying here is to ignore their preference.
I've already quoted them in this thread saying this, but this keeps
being ignored. Here is my previous message:
https://lore.kernel.org/lkml/CAP-5=fXSgpZaAgickZSWgjt-2iTWK7FFZc65_HG3QhrTg1mtBw@xxxxxxxxxxxxxx/

> > Maybe one issue was calling them 'legacy' events in the first place, and I'm
> > not sure if there is complete consensus that these are legacy.
>
> I don't see them as "legacy".

So let me say this is really distracting from the intent in the
series. The series is:
1) trying to clean up wild carding ambiguity - not making it dependent
on the name of the event being parsed, the behavior of `cpu_cycles`
matches that of `cpu-cycles`
2) trying to make the legacy vs sysfs/json prioritization consistent -
making it so that `cpu_core/instructions/` encoding matches
`instructions` as we display both of these as cpu_core/instructions/
and it is confusing to a user that different encodings were used. We
also pattern match perf_event_attr config values in places like:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/topdown.c?h=perf-tools-next#n38
so >1 config for the same event means such pattern matching needs to
consider all cases.

There is now a "Make Legacy Events Great Again" (MLEGA) effort that
is standing in the way of clean up work. As already stated but
repeating, why is MLEGA a bad thing:
1) legacy events lack descriptions and are open for interpretation.
For example, do the events include counts for things done
speculatively?
2) it is unneeded. Vendors can choose to name events the same name in
sysfs and json. ARM are achieving pretty much all of the same thing
with architecture standard events but in their use they will have
appropriate event descriptions for each model giving all the caveats
for the event. When something is common we can encode it in the common
json we don't need legacy events for this:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common?h=perf-tools-next
3) LLC doesn't mean L2, it nearly always means L3, the event names
have become obsolete and confusing. More MLEGA means more of this.
4) PMUs have only ever supported a subset of the legacy events. We
have to make use of legacy events in `perf stat` not fail when they
are implicitly added as default events and via the -ddd options.
5) multiple encodings/PMU types for the same thing complicates things
like topdown event ordering, that is a kernel/PMU restriction, and
metric event deduplication.
6) legacy events are broken on ARM Apple-M and have been broken on Juno boards.
7) architectures trying to push complexity into user land (RISC-V) are
being forced to push it into the kernel/driver.

Is MLEGA relevant here? Well if you want legacy events to be >
sysfs/json then yes. For wild carding I don't see why MLEGA cares. Do
I want to push on MLEGA? No, and I think the reasons above are why it
hasn't happened in over 10 years.

> > Can't they continue be the short easy list of events likely to be common across
> > platforms?
>
> That is my understanding of the original intent, yes.
>
> A first approximation, those who want to dig deeper, well, learn more
> about the architecture, learn about the extensive support for
> vendor/JSON events, sysfs ones, how to properly configure them taking
> advantage of the high level of flexibility both perf, the tool and perf
> the kernel subsystem allows them to be used, in groups, leader sampling,
> multiplexing or not, etc.
>
> But lots of developers seem to be OK with just the default events or
> using those aliases for expected events across architectures, sometimes
> specifying :ppp as a hint that if there are more precise events in this
> architecture, please use them, for instance.

When and where have I said that I don't want to support events like
instructions and cycles? See above, consistent wild carding and the
encoding priority are the only issues here.

> > If there is an issue with some of them being wrong in some places
> > we can move forward from that by making sure new platforms do it right,
>
> And adding special case for broken things when we know that some event
> named "cycles" shouldn't be used for sampling, for instance.

What is this? A new framework for special casing PMUs and events,
where we're maintaining lists of broken PMUs and changing encodings?
And tooling like event sorting, metrics, is all supposed to just work
with this? Are we going to write json for this? Who is writing/testing
it for Apple-M?

Special cases should be the exception and not an expected norm.

> > rather than changing the logic for everyone to fix that bug.
>
> Right. And again, if something doesn't work for a while in some
> architecture, its just a matter of specifying the name of the event in
> full form, with the PMU prefix, etc.

So MLEGA would like sysfs/json when they are broken? This is just
silly, if something is broken we should just not use it. Having 2 ways
of stating something and expecting different behaviors from them is
clearly brittle.

> > For the argument that Google prefers to use the sysfs events because of
> > these differences, I don't think there is anything preventing that kind of
> > use today?
>
> Indeed.

I explained that in the context of why legacy events are wrong. I've
repeated it above. This is not addressing the issues of wild carding
and the encoding priority.

> > Or at least not for the main priority flip proposed, but maybe
> > there are some smaller adjacent bugs that can be fixed up separately.
>
> Yes, and work in this area is greatly appreciated.

I don't know what your proposals are and to my eyes none of them have
ever existed, no one has created them in over 10 years.
I am trying to fix wild carding and the encoding priority.
Bike shedding on MLEGA, please can we move it to a separate email thread.

Thanks,
Ian