On Tue, Mar 4, 2025 at 5:50 AM James Clark <james.clark@xxxxxxxxxx> wrote:
Some changes related to the discussion here [1] about deduplication of
PMUs. This change stands on its own right now, but it may not go far
enough. This can either be dropped for now or applied and improved
later. Extra ideas are as follows.
Treating alphanumeric suffixes as duplicate has been slightly
problematic due to marketing strings having looks-like-but-not-actually
alphanumeric suffixes. For example 'cpum_cf' and now the one digit
longer than before 'cortex-a720'. The easy fix is to increase the
minimum length considered for deduplication as suggested [1], but as
also mentioned, the current mrvl_ddr_pmu PMU names don't zero pad the
address, meaning that > 2 alphanumeric suffixes are already technically
not enough to deduplicate the same PMUs. They could have only a 2 digit
alphanumeric address suffix. Increasing the minimum digits feels a bit
like kicking the can down the road and it places awkward limitations on
marketing names which we have no control over. Also I'm not sure helps
the following much:
The hex suffix thing was very unfortunate, can we reverse that
decision and move the physical address.. it captures into a caps file?
Fwiw, we have a similar hex mess with raw events. A raw event can be
r0xead or read, but read also makes a pretty nice event name. This is
solved by first checking if read is an event and if not assuming it is
a raw encoding. Would anyone really care if raw events always had to
start with r0x? We seem to get tangled up in corner cases like this
too often, for example legacy event priorities and topdown event
sorting. Keeping things simple would be nice from a correctness pov
but also so that it's easy for other tools to emulate.
The problem is that arm_cmn_[n] PMUs have a numeric suffix, but they can
have different events. Even if we were adding this PMU today, keeping
the suffix rule in mind, it would be difficult to come up with a suffix
that differentiates the different ones. Flavour words might work, but
that complicates the kernel which would have to group them and come up
with flavours rather than just doing an i++. Deduplicating too
aggressively on only PMU name suffix means only arm_cmn_1's events get
listed, missing other events, and it's hard to see which events relate
to which PMU.
Therefore in addition to the changes in this patchset I'd like to look
into:
* Collapsing duplicate PMU names into ranges, for example
arm_pmu_v3_[0-4], rather than simply concatenating names as done in
this patchset
The current suffix rule is weird as Intel's GPU i915 PMU appears to be
a PMU called 'i' with a 915 numeric suffix.
I think capturing the rules in the ABI doc and some kind of legacy
compatibility is do-able.
I like regular expressions over globs/fnmatch, for example, we could
use flex to turn the CPUID matches in mapfile.csv into something that
is parsed and matched with less interpretation/compilation at runtime.
I suspect we can bike-shed for a long time on what these new rules
will be, which has been why I've generally just tried to match
existing and inelegant behaviors.
* Deduplicate uncore based on the contents of events/ rather than just
the suffix
As some background, the original commit for deduplication, commit
3241d46f5f54 ("perf pmus: Sort/merge/aggregate PMUs like mrvl_ddr_pmu")
mentions reducing the number of duplicate PMUs, and is presumably
motivated by usability. But there are also other commits mentioning
reducing openat()s, for example lazily loading formats 504026412162
("perf pmu: Make the loading of formats lazy"). Deduplicating based on
the contents of the events/ folder is somewhat in contention with this
reduction, but could be done along side some more lazy loading (like of
the terms) and hashing the result of readdir() without opening any of
the contents. JSON tables can have event name hashes calculated at build
time if we want to consider them for deduplication too.
I worry about the run time cost of this when there are 100s of uncore
PMUs. I wonder if the case of an empty events directory would also be
common.
Wrt the x86 conventions I think Kan is best placed to say what the
preferred convention should be.
The json will routinely say this event is on a PMU name without a
suffix, and the number of PMUs will only be known at runtime. Hashing
without the suffix would be fine but I think the controversy is over
what should be considered a suffix.
Thanks for doing the series, I'll dig into/test each one in turn.
Ian
Then with the events hash, PMU's can be sorted based on this and the
'Unit:' string can be constructed with a set of values that collapses
adjacent suffixes to display as ranges. I believe that could remove the
need for any further changes to duplication based on suffix, but still
avoids over deduplication.
[1]: https://lore.kernel.org/linux-perf-users/CAP-5=fW_Sq4iFxoWPWuixz9fMLBPyPUO0RG0KPbYa-5T0DZbTA@xxxxxxxxxxxxxx/
---
James Clark (3):
perf list: Order events by event name before PMU name
perf list: Collapse similar events across PMUs
perf list: Don't deduplicate core PMUs when listing events
tools/perf/builtin-list.c | 2 +
tools/perf/util/pmu.c | 5 ++-
tools/perf/util/pmu.h | 2 +
tools/perf/util/pmus.c | 95 ++++++++++++++++++++++++++++++++++--------
tools/perf/util/print-events.h | 1 +
5 files changed, 86 insertions(+), 19 deletions(-)
---
base-commit: 7788ad59d1d9617792037a83513be5b1dd14150f
change-id: 20250304-james-perf-hybrid-list-11b888cf435a
Best regards,
--
James Clark <james.clark@xxxxxxxxxx>