[PATCH v2 0/7] PMU performance improvements

From: Ian Rogers
Date: Thu Oct 12 2023 - 13:56:53 EST


Performance improvements to pmu scanning by holding onto the
event/metric tables for a cpuid (avoid regular expression comparisons)
and by lazily computing the default perf_event_attr for a PMU.

Before
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 251.990 usec (+- 4.009 usec)
Average PMU scanning took: 3222.460 usec (+- 211.234 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 260.120 usec (+- 7.905 usec)
Average PMU scanning took: 3228.995 usec (+- 211.196 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 252.310 usec (+- 3.980 usec)
Average PMU scanning took: 3220.675 usec (+- 210.844 usec)

After:
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 28.530 usec (+- 0.602 usec)
Average PMU scanning took: 275.725 usec (+- 18.253 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 28.720 usec (+- 0.446 usec)
Average PMU scanning took: 271.015 usec (+- 18.762 usec)
% Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average core PMU scanning took: 31.040 usec (+- 0.612 usec)
Average PMU scanning took: 267.340 usec (+- 17.209 usec)

Measuring the pmu-scan benchmark on a Tigerlake laptop: core PMU
scanning is reduced to 11.5% of the previous execution time, all PMU
scanning is reduced to 8.4% of the previous execution time. There is a
4.3% reduction in openat system calls.

v2. Address feedback from Adrian Hunter and Yang Jihong to allow the
caching to address varying CPUIDs per PMU (currently an ARM64 only
feature) and to cache when there is no table to return.

Ian Rogers (7):
perf pmu: Rename perf_pmu__get_default_config to perf_pmu__arch_init
perf intel-pt: Move PMU initialization from default config code
perf arm-spe: Move PMU initialization from default config code
perf pmu: Const-ify file APIs
perf pmu: Const-ify perf_pmu__config_terms
perf pmu-events: Remember the perf_events_map for a PMU
perf pmu: Lazily compute default config

tools/perf/arch/arm/util/cs-etm.c | 13 +---
tools/perf/arch/arm/util/pmu.c | 10 +--
tools/perf/arch/arm64/util/arm-spe.c | 48 ++++++------
tools/perf/arch/s390/util/pmu.c | 3 +-
tools/perf/arch/x86/util/intel-pt.c | 27 +++----
tools/perf/arch/x86/util/pmu.c | 6 +-
tools/perf/pmu-events/jevents.py | 109 +++++++++++++++++----------
tools/perf/util/arm-spe.h | 4 +-
tools/perf/util/cs-etm.h | 2 +-
tools/perf/util/intel-pt.h | 3 +-
tools/perf/util/parse-events.c | 12 +--
tools/perf/util/pmu.c | 38 +++++-----
tools/perf/util/pmu.h | 22 +++---
tools/perf/util/python.c | 2 +-
14 files changed, 160 insertions(+), 139 deletions(-)

--
2.42.0.655.g421f12c284-goog