Re: [PATCH 0/3] perf list: Collapse similar events across PMUs

From: James Clark
Date: Fri Mar 07 2025 - 09:47:40 EST

Next message: Jan Kara: "Re: [PATCH] fs: binfmt_elf_efpic: fix variable set but not used warning"
Previous message: H. Peter Anvin: "Re: [PATCH] x86/boot: Do not test if AC and ID eflags are changeable on x86_64"
In reply to: Ian Rogers: "Re: [PATCH 0/3] perf list: Collapse similar events across PMUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 05/03/2025 8:38 pm, Ian Rogers wrote:

On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@xxxxxxxxxx> wrote:

Some changes related to the discussion here [1] about deduplication of
PMUs. This change stands on its own right now, but it may not go far
enough. This can either be dropped for now or applied and improved
later. Extra ideas are as follows.

Treating alphanumeric suffixes as duplicate has been slightly
problematic due to marketing strings having looks-like-but-not-actually
alphanumeric suffixes. For example 'cpum_cf' and now the one digit
longer than before 'cortex-a720'. The easy fix is to increase the
minimum length considered for deduplication as suggested [1], but as
also mentioned, the current mrvl_ddr_pmu PMU names don't zero pad the
address, meaning that > 2 alphanumeric suffixes are already technically
not enough to deduplicate the same PMUs. They could have only a 2 digit
alphanumeric address suffix. Increasing the minimum digits feels a bit
like kicking the can down the road and it places awkward limitations on
marketing names which we have no control over. Also I'm not sure helps
the following much:

The hex suffix thing was very unfortunate, can we reverse that
decision and move the physical address.. it captures into a caps file?

Do you mean changing the PMU's name in the driver? I can't imagine that would go down too well given how long it's been there. Seems to me there's too much risk of it breaking some other tool or script. It would be a bit unfair to do that to fix an assumption that only really exists in Perf, and especially if there could be other ways to achieve the same behaviour in Perf.

Fwiw, we have a similar hex mess with raw events. A raw event can be
r0xead or read, but read also makes a pretty nice event name. This is
solved by first checking if read is an event and if not assuming it is
a raw encoding. Would anyone really care if raw events always had to
start with r0x? We seem to get tangled up in corner cases like this
too often, for example legacy event priorities and topdown event
sorting. Keeping things simple would be nice from a correctness pov
but also so that it's easy for other tools to emulate.

I think tools like this are always going to be full of various heuristics and magic to make the experience as user friendly as possible. If using the bare kernel interface was already fully featured enough then there wouldn't be a need for Perf to exist.

In an ideal world we'd have some capability file like you mentioned above. Or a folder with symlink to other instances of the same PMU then a tool can follow the links to other PMUs to deduplicate them. In theory we could add that right now. At least it wouldn't depend on any name restrictions, but it does push complexity that should probably exist in a tool into the kernel. In practice I think it would be a bit of a nightmare because it has to be re-implemented for everyone's PMU driver. That's got to be more code than tools doing it on the list of events produced.

The problem is that arm_cmn_[n] PMUs have a numeric suffix, but they can
have different events. Even if we were adding this PMU today, keeping
the suffix rule in mind, it would be difficult to come up with a suffix
that differentiates the different ones. Flavour words might work, but
that complicates the kernel which would have to group them and come up
with flavours rather than just doing an i++. Deduplicating too
aggressively on only PMU name suffix means only arm_cmn_1's events get
listed, missing other events, and it's hard to see which events relate
to which PMU.

Therefore in addition to the changes in this patchset I'd like to look
into:

* Collapsing duplicate PMU names into ranges, for example
arm_pmu_v3_[0-4], rather than simply concatenating names as done in
this patchset

The current suffix rule is weird as Intel's GPU i915 PMU appears to be
a PMU called 'i' with a 915 numeric suffix.
I think capturing the rules in the ABI doc and some kind of legacy
compatibility is do-able.
I like regular expressions over globs/fnmatch, for example, we could
use flex to turn the CPUID matches in mapfile.csv into something that
is parsed and matched with less interpretation/compilation at runtime.
I suspect we can bike-shed for a long time on what these new rules
will be, which has been why I've generally just tried to match
existing and inelegant behaviors.

I think at this point we could even consider a whitelist of known PMU names and how each one behaves. It's probably less than a handful long.

* Deduplicate uncore based on the contents of events/ rather than just
the suffix

As some background, the original commit for deduplication, commit
3241d46f5f54 ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
mentions reducing the number of duplicate PMUs, and is presumably
motivated by usability. But there are also other commits mentioning
reducing openat()s, for example lazily loading formats 504026412162
("perf pmu: Make the loading of formats lazy"). Deduplicating based on
the contents of the events/ folder is somewhat in contention with this
reduction, but could be done along side some more lazy loading (like of
the terms) and hashing the result of readdir() without opening any of
the contents. JSON tables can have event name hashes calculated at build
time if we want to consider them for deduplication too.

I worry about the run time cost of this when there are 100s of uncore

I'm interested in quantifying this time if we do any of these changes.

Is this particular example of 100s of PMUs with mrvl_ddr_pmu? How many events does it have? And which particular uses cases are most important, is it just listing or opening events? From what I can see these proposals shouldn't actually effect opening events, and you'd think listing could take a small time hit as it's done less frequently than opening. And there's already a lot of output.

PMUs. I wonder if the case of an empty events directory would also be
common.

Is this an issue in this case? If we're talking about listing events, then we show the single deduplicated instance as it is now. With an empty events folder nothing is lost.

Wrt the x86 conventions I think Kan is best placed to say what the
preferred convention should be.
The json will routinely say this event is on a PMU name without a
suffix, and the number of PMUs will only be known at runtime. Hashing
without the suffix would be fine but I think the controversy is over
what should be considered a suffix.

Well, I was considering what would happen if we always apply this deduplication without taking suffixes into account. The suffix rule just becomes a bit of a premature optimization to avoid reading the events folder in some cases, but if we make it acceptably fast all PMUs regardless of name could be considered for deduplication.

Thanks for doing the series, I'll dig into/test each one in turn.
Ian

Then with the events hash, PMU's can be sorted based on this and the
'Unit:' string can be constructed with a set of values that collapses
adjacent suffixes to display as ranges. I believe that could remove the
need for any further changes to duplication based on suffix, but still
avoids over deduplication.

[1]: https://lore.kernel.org/linux-perf-users/CAP-5=fW_Sq4iFxoWPWuixz9fMLBPyPUO0RG0KPbYa-5T0DZbTA@xxxxxxxxxxxxxx/

---
James Clark (3):
perf list: Order events by event name before PMU name
perf list: Collapse similar events across PMUs
perf list: Don't deduplicate core PMUs when listing events

tools/perf/builtin-list.c | 2 +
tools/perf/util/pmu.c | 5 ++-
tools/perf/util/pmu.h | 2 +
tools/perf/util/pmus.c | 95 ++++++++++++++++++++++++++++++++++--------
tools/perf/util/print-events.h | 1 +
5 files changed, 86 insertions(+), 19 deletions(-)
---
base-commit: 7788ad59d1d9617792037a83513be5b1dd14150f
change-id: 20250304-james-perf-hybrid-list-11b888cf435a

Best regards,
--
James Clark <james.clark@xxxxxxxxxx>

Next message: Jan Kara: "Re: [PATCH] fs: binfmt_elf_efpic: fix variable set but not used warning"
Previous message: H. Peter Anvin: "Re: [PATCH] x86/boot: Do not test if AC and ID eflags are changeable on x86_64"
In reply to: Ian Rogers: "Re: [PATCH 0/3] perf list: Collapse similar events across PMUs"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]