Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"

From: Ian Rogers
Date: Thu Feb 06 2025 - 02:45:21 EST


On Wed, Feb 5, 2025 at 9:09 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
>
> On Tue, Feb 04, 2025 at 08:48:20PM -0800, Ian Rogers wrote:
> > On Tue, Feb 4, 2025 at 5:58 PM Namhyung Kim <namhyung@xxxxxxxxxx> wrote:
> You mean the -z flag which is documented in the man page and also it the
> help message (perf top -h). Anyone can read the doc can know it's
> there. Of course, people would prefer reading zines than man pages. :)

I link to the patch. My point is that something as minor as making
"perf top" behave as "top" does was too big a (user command line)
regression to land - I strongly suspect nobody would notice. Your
proposal breaks all non-core events on every perf command that takes
PMU events. It is a bigger change.

> > So, it would seem to me that
> > changing something as fundamental as how all non-core events behave
> > would be seen as a regression.
>
> Yep, it'd be a regression.

Agreed, you are arguing for a regression.

> Which suffix do you mean?

It's off topic. ARM added hex suffixes to PMUs representing physical
memory addresses of memory controllers but then that makes cortex_a72
look like it has a 3 character suffix. So perf assumes hex digits more
than 4 characters long is a hex suffix, which of course it wouldn't be
for a1000 (which is also somewhat close to being an old Acorn
archimedes machine number ;-) ).

> Anyway, the person looked up the intel webpage would be eager to learn
> about performance related things. Can we also assume if they also want
> to learn about the perf tool itself? :)

I'm not sure how turning data_read into
uncore_imc_free_running/data_read/ is in anyway helping people
understand perf? They want an event name that matches the
documentation, manual, web site. It is what the vendors I've spoken to
want as they use the event names across tools (fwiw oprofile doesn't
even have a notion of a PMU). To my knowledge the PMU names are the
wild west, often illogical and never mentioned in any kind of
documentation. I have a hard time explaining how the suffixes work and
I believe there are more conventions in the works where there can be
multiple what we are currently calling suffixes.

> If it's not the case, we have this:
>
> $ perf record -e xxx
> event syntax error: 'xxx'
> \___ Bad event name
>
> Unable to find event on a PMU of 'xxx'
> Run 'perf list' for a list of valid events
>
> Usage: perf record [<options>] [<command>]
> or: perf record [<options>] -- <command> [<options>]
>
> -e, --event <event> event selector. use 'perf list' to list available events
>
> So it says twice to run 'perf list' to see the events. Then they can
> run either:
>
> $ perf list | grep xxx
>
> or
>
> $ perf list xxx
>
> to see the actual name of the event available in the perf tool.

Why was adding a PMU to an event name, working around ARM's PMU bug,
such an unsurmontable problem that the original change was reverted?
Because 1 person didn't want to have to write a PMU prefix and
considered it a monumental regression having to do so.

> >
> > Even with this what would be the behavior of core events? You want
> > legacy events to have priority over sysfs/json when there is no PMU.
> > You know, and have stated not caring, RISC-V wants different and that
> > it breaks Apple-M's PMUs for a fairly large range of kernel releases
> > including 1 LTS kernel - the only reason I'm writing patches in this
> > area in the 1st place. Software is soft and you can go fix software
> > anywhere in the stack. Listening to vendors and not breaking everyone
> > is the point-of-view these patches have been coming from. I find it
> > very hard to have a conversation where this is just forgotten about
> > and we're working on hypotheticals which seem to be both unwanted and
> > implausible.
>
> Sorry I don't want to repeat that too. Correct me if I'm wrong:

You are wrong.

> 1. RISC-V is working on a solution with the current status and it's not
> absoluted needed to change the current behavior.

They said to you directly it was what they wanted, that's why I
reposted this change and it is, has always been, in the cover letter.
They've then followed up expressing their desire for this behavior but
having to have a plan b as the original change was reverted and you
are blocking this change landing.

> 2. Apple-M is fixed already.

No, James tried to repro the bug on a Juno board, not an Apple M, and
didn't succeed. I don't know what kernel he tried. I was told by Mark
Rutland (at LPC) that the tool fix was absolutely necessary and the
PMU driver wouldn't be fixed, hence the series flipping behavior that
I thought Intel would most likely block and wasn't keen to do in the
1st place (not least wade through all the test behavior changes and
the bug tail). All of this was premised on a threat of reverting all
of the hybrid support so that Apple M could be made to work again, and
I was trying to do a less worse alternative.
https://lore.kernel.org/r/20231123042922.834425-1-irogers@xxxxxxxxxx

> >
> > I don't know why people (yourself, Linus) keep wanting to show me the
> > perf list output. It is arbitrary. I rewrote it and changed the
> > behavior of all uncore PMUs within it (we didn't used to deduplicate
> > based on the PMU suffix). It is nice that people think it reads like
> > some religious text.
>
> I think it's what we want users to know how to use the events.

I don't understand what you are trying to say. I'm saying the behavior
of perf list in its output is arbitrary. We use the same printing code
for every kind of event. An aesthetic decision to put things on a line
does not imply that it is more valid to use or not use a PMU, it just
happens to be what the tool does. Did I break perf list as if you look
in old perf list you see:
```
$ perf list
List of pre-defined events (to be used in -e or -M):

duration_time [Tool event]
...
```
while now you see:
```
$ perf list
List of pre-defined events (to be used in -e or -M):
...
tool:
duration_time
[Wall clock interval time in nanoseconds. Unit: tool]
...
```
I'm hoping people find it useful to have the unit documented.

> > Why is the formatting different in perf list for
> > json specified events? Well it is because json events have
> > descriptions and the events you are showing with a PMU don't have a
> > description. I think because there is no description, an effort was
> > made to keep the output compact and put the PMU and event name
> > together. It wasn't trying to enter some kind of long lasting marriage
> > that the event name should only ever be used with the PMU.
>
> I like the description but I don't like the formatting. I think I
> understand why it looks like that but it could be different. Anyway,
> I don't think showing PMU name is related to having descriptions.

No, it has more to do with how I was feeling about filling in two
string fields called name and alias when rewriting the perf list code.
I added aliases containing the PMU name just to add a little bit more
detail when there seemed to be little documentation with certain
events. I never intended placing the PMU names into any events to be a
commitment that all non-core PMU events would need a PMU prefix and to
break all such people using those events.

> > What happens if an event is both in sysfs and json? Well the sysfs event
> > will get the description from the json and then I believe it won't
> > behave as you show. Did the event get broken, as perf list no longer
> > shows it with a PMU, by having a json description written? I think not
> > and I think having descriptions with events is a good thing.
>
> That's bad. Probably we should fix it takes only one of the sources and
> change the JSON event not to clash with sysfs.

No, you are talking about breaking everything already, let's not break
it yet further - not least as we lack a reasonable way to test it. I
think if you are serious about having such breaking changes then it is
best you add a new command line option, like with libpfm events.

Thanks,
Ian