[PATCH v21 00/21] Add additional python API support

From: Ian Rogers

Date: Mon Jun 15 2026 - 21:16:54 EST


The perf script command has long supported running Python and Perl scripts
by embedding libpython and libperl. This approach has several drawbacks:
- overhead by creating Python dictionaries for every event (whether used or
not),
- complex build dependencies on specific Python/Perl versions,
- complications with threading due to perf being the interpreter,
- no clear way to run standalone scripts like ilist.py.

This series takes a different approach with some initial implementation posted
as an RFC last October:
https://lore.kernel.org/linux-perf-users/20231025081156.963491-1-irogers@xxxxxxxxxx/

It builds the python extension as part of the normal build. The extension
is able to read perf.data files. The event callbacks are converted to
have a python evsel/evlist/sample passed to them.

To make the review process more manageable, the original 58-patch
series has been split. This v18 series represents "Phase 1: API &
Infrastructure" (20 patches). The first 4 patches of Phase 1
(cleanups and arch-specific header sorting) have already been merged
upstream.

This remaining set contains:
1. Missed explicit dependency cleanups and header sorting for util/ and python.
2. Crucial core safety infrastructure (reference counting for evlist/evsel)
to support safe lifecycle management in garbage-collected Python.
3. The core Python API extensions (session wrappers, perf_data wrappers,
sample accessors, stubs, and LiveSession helper).

Phase 2 ("Script Porting & Tool Migration") will migrate the remaining 35+
existing Python/Perl scripts to the new API (which yields up to 35x speedups
as demonstrated previously) and the final removal of embedded interpreters.


Addressing v20 Review Feedback:

The v21 patches merely update a commit message. Sashiko has correctly
identified several structural limitations and edge cases. Almost all
of these points correspond to known limitations that we have
explicitly mapped out and planned for "Phase 2" of this Python binding
refactor. Our primary goal with this v21 series (Phase 1) is to
establish the baseline abstractions, fix immediate memory corruption
bugs, and provide the foundational wrappers.

- Patch 4 (perf data: Add open flag): The reported issue with
perf_data__switch failing to close thread-local files during directory
mode rotation is a pre-existing bug in the C core. This patch strictly
fixes the boolean data->open state tracking for Python integration.

- Patch 7 (perf evlist: Add reference count checking): evlist is
fundamentally not thread-safe for concurrent destruction in the C core.
The asymmetric leak (leaving cycles intact if evlist is dropped before
evsel) is a known limitation. Implementing a thread-safe, symmetrical
global GC lifecycle for evlist/evsel is planned for Phase 2.

- Patch 10 (perf python: Add python session abstraction):
1) perf.thread cannot cause a NULL dereference because it does not
implement tp_new/tp_init, so it cannot be instantiated directly from
Python. 2) Proper cross-endian analysis of packed 32-bit fields and
the implementation of the remaining tool callbacks are explicitly
deferred to Phase 2.

- Patch 11 (perf python: Refactor and add accessors to sample event):
1) Session UAF due to tracking the topological lifetime of the session
relative to events requires a major lifecycle overhaul, slated for
Phase 2. 2) The commit message previously inaccurately claimed we only
allocate the strictly necessary copy size. This has been corrected
in v21. We use PyObject_New with the static tp_basicsize; optimizing
the dynamic allocation size is a future cleanup.

- Patch 16 (perf python: Add syscall name/id): The legacy scripts
remain untouched in this series. Migrating the existing Python/Perl
scripts to use the new C-API and dropping libaudit entirely is the
core objective of Phase 2.

- Patch 20 (perf python: Add perf.pyi stubs file): While the stub is
currently loosely typed as Any for threads, the segmentation fault is
no longer possible since Patch 19 added runtime type validation
(PyObject_TypeCheck) to the C implementation. We will tighten the
stub typing in a follow-up patch.

Addressing v19 Review Feedback:
- Patch 19: Added PyObject_TypeCheck runtime validations to parse_events
and parse_metrics to prevent memory corruption when invalid objects
are passed from Python, resolving the blind C cast vulnerability.
- Patch 20: Updated perf.pyi stubs to properly type the threads parameter
as Optional['thread_map'] instead of Optional[Any] to catch invalid
types during static analysis.

Note: Other architectural limitations raised in the v19 review (e.g. TOCTOU
cycle races, asymmetric cycle leaks, cross-endian needs_swap bypass, and
session object dangling pointers) are explicitly acknowledged as limitations
of this transitional patch set. As noted previously, implementing thread-safe,
symmetrical GC for the Python bindings and hardening the C-API boundary
are the primary focus of the upcoming Phase 2 series.

Addressing v18 Review Feedback:
- Patch 10 (`perf.thread` initialization): Added missing `CHECK_INITIALIZED()`
in `pyrf_thread__comm` to prevent NULL dereference when instantiated directly.
- Patch 14 (STAT events): Passed `NULL` to `pyrf_event__new` for STAT events
to prevent unconditional `evsel__parse_sample` and potential out-of-bounds
reads.
- Patch 20 (`LiveSession` timeout): Broadened exception handling to explicitly
ignore "Unexpected header type" for valid but unsupported events.

Note: Other feedback items raised in v18 (TOCTOU cycle races, asymmetric
cycle collection, cross-endian data handling, guest machine symbol resolution,
and pre-existing memory leaks/uninitialized variables) are acknowledged as
limitations of the current implementation and will be addressed in Phase 2
or separate cleanup patches.

Addressing v17 Review Feedback:
- Patch 8: Added missing `perf_sample__exit(&sample)` to
`intel_pt_synth_ptwrite_sample()` to fix an evsel reference leak.
- Patch 10: Fixed a bug in `pyrf_session_tool__sample()` that caused
double byte-swapping on foreign-endian files by temporarily disabling
`needs_swap` during re-parsing instead of assigning `*sample`.
- Patch 11: (Missed fixing address resolution for guest samples - will
fix in next spin or Phase 2).
- Patch 19: Prevented `python-clean` from deleting tracked source file
`python/perf.pyi` when building in tree. Also explicitly exported
`COUNT_HW_REF_CPU_CYCLES`, `thread`, `callchain`, and `callchain_node`
types/constants to the module via `PyInit_perf`.
- Patch 20: Narrowed `except TypeError` in `LiveSession.run()` to explicitly
check for "Unknown CPU" so legitimate event parsing failures aren't
swallowed.

Addressing v16 Review Feedback:
- Patch 10: Removed unconditional `perf_session__create_kernel_maps` to
prevent corrupting cross-platform offline analysis.
- Patch 11: Corrected inaccurate commit message regarding memory
allocation sizing.
- Patch 19: Fixed numerous type inconsistencies, missing properties, and
incorrect return types in the `perf.pyi` stubs file.
- Patch 20: Cleaned up unused `import errno` in `perf_live.py`.

Note: Several issues spotted in v16/v17 review (e.g. pyrf_evsel__init format
string type mismatch, evlist lockless double free, asymmetric memory
leaks, missing Py_None type checks, and lack of NUL-termination for
COMM/MMAP) are pre-existing limitations in the codebase or side-effects
of the transitional cycle-breaking design. As discussed previously, these
structurally complex or pre-existing bugs are deliberately deferred to
the Phase 2 series.

Addressing v15 Review Feedback:
- Patch 2 (buffer overflow & type checks): The buffer overflow in
`pyrf_event__new()` has been resolved by verifying `event->header.size`
against the event struct size.
- Patch 2 (PyObject_HEAD_INIT): The initialization macros have been corrected
to use the proper Python 3 compatibility approach.
- Patch 5 (Memory leak & RC validation): Applied extensive structural fixes
using `refcount_t` semantics. Added validation wrapper structures to
statically verify memory access safety in the lockless cycles.
- Patch 8 (`evlist.open` memory leak): Restructured lifecycle management for
mmap buffers using `do_munmap()` hooks in `evlist__put()`.
- Patch 16 (`perf.pyi` stubs): Corrected return types (`Optional` and proper
object properties) and missing documentation strings in type stubs.
- Patch 20 (`perf_live.py` timeout): Adjusted poll timeout from 10000ms back
to 100ms, replacing the tight exception loop.

Ian Rogers (21):
perf util: Sort includes and add missed explicit dependencies
perf python: Add missed explicit dependencies
perf evsel/evlist: Avoid unnecessary #includes
perf data: Add open flag
perf evlist: Add reference count
perf evsel: Add reference count
perf evlist: Add reference count checking
perf python: Use evsel in sample in pyrf_event
perf python: Add wrapper for perf_data file abstraction
perf python: Add python session abstraction wrapping perf's session
perf python: Refactor and add accessors to sample event
perf python: Add mmap2 event
perf python: Add callchain support
perf python: Extend API for stat events in python.c
perf python: Expose brstack in sample event
perf python: Add syscall name/id to convert syscall number and name
perf python: Add config file access
perf python: Handle Py_None for thread and cpu maps
perf python: Add type checking for parse_events/parse_metrics
perf python: Add perf.pyi stubs file
perf python: Add LiveSession helper

tools/perf/Makefile.perf | 7 +-
tools/perf/arch/arm/util/cs-etm.c | 10 +-
tools/perf/arch/arm64/util/arm-spe.c | 8 +-
tools/perf/arch/arm64/util/hisi-ptt.c | 2 +-
tools/perf/arch/x86/tests/hybrid.c | 22 +-
tools/perf/arch/x86/tests/topdown.c | 4 +-
tools/perf/arch/x86/util/auxtrace.c | 2 +-
tools/perf/arch/x86/util/intel-bts.c | 6 +-
tools/perf/arch/x86/util/intel-pt.c | 9 +-
tools/perf/arch/x86/util/iostat.c | 14 +-
tools/perf/bench/evlist-open-close.c | 29 +-
tools/perf/builtin-annotate.c | 7 +-
tools/perf/builtin-ftrace.c | 14 +-
tools/perf/builtin-inject.c | 9 +-
tools/perf/builtin-kvm.c | 14 +-
tools/perf/builtin-kwork.c | 8 +-
tools/perf/builtin-lock.c | 4 +-
tools/perf/builtin-record.c | 95 +-
tools/perf/builtin-report.c | 6 +-
tools/perf/builtin-sched.c | 30 +-
tools/perf/builtin-script.c | 15 +-
tools/perf/builtin-stat.c | 83 +-
tools/perf/builtin-top.c | 104 +-
tools/perf/builtin-trace.c | 65 +-
tools/perf/python/perf.pyi | 672 +++++
tools/perf/python/perf_live.py | 59 +
tools/perf/tests/backward-ring-buffer.c | 26 +-
tools/perf/tests/code-reading.c | 14 +-
tools/perf/tests/event-times.c | 6 +-
tools/perf/tests/event_update.c | 4 +-
tools/perf/tests/evsel-roundtrip-name.c | 8 +-
tools/perf/tests/evsel-tp-sched.c | 4 +-
tools/perf/tests/expand-cgroup.c | 12 +-
tools/perf/tests/hists_cumulate.c | 2 +-
tools/perf/tests/hists_filter.c | 2 +-
tools/perf/tests/hists_link.c | 2 +-
tools/perf/tests/hists_output.c | 2 +-
tools/perf/tests/hwmon_pmu.c | 7 +-
tools/perf/tests/keep-tracking.c | 10 +-
tools/perf/tests/mmap-basic.c | 24 +-
tools/perf/tests/openat-syscall-all-cpus.c | 6 +-
tools/perf/tests/openat-syscall-tp-fields.c | 26 +-
tools/perf/tests/openat-syscall.c | 6 +-
tools/perf/tests/parse-events.c | 139 +-
tools/perf/tests/parse-metric.c | 8 +-
tools/perf/tests/parse-no-sample-id-all.c | 2 +-
tools/perf/tests/perf-record.c | 38 +-
tools/perf/tests/perf-time-to-tsc.c | 12 +-
tools/perf/tests/pfm.c | 12 +-
tools/perf/tests/pmu-events.c | 11 +-
tools/perf/tests/pmu.c | 4 +-
tools/perf/tests/sample-parsing.c | 45 +-
tools/perf/tests/shell/lib/setup_python.sh | 13 +
tools/perf/tests/sw-clock.c | 20 +-
tools/perf/tests/switch-tracking.c | 11 +-
tools/perf/tests/task-exit.c | 20 +-
tools/perf/tests/time-utils-test.c | 14 +-
tools/perf/tests/tool_pmu.c | 7 +-
tools/perf/tests/topology.c | 4 +-
tools/perf/tests/uncore-event-sorting.c | 6 +-
tools/perf/ui/browsers/annotate.c | 2 +-
tools/perf/ui/browsers/hists.c | 22 +-
tools/perf/util/Build | 1 -
tools/perf/util/amd-sample-raw.c | 2 +-
tools/perf/util/annotate-data.c | 2 +-
tools/perf/util/annotate.c | 10 +-
tools/perf/util/auxtrace.c | 14 +-
tools/perf/util/block-info.c | 4 +-
tools/perf/util/bpf_counter.c | 2 +-
tools/perf/util/bpf_counter_cgroup.c | 14 +-
tools/perf/util/bpf_ftrace.c | 9 +-
tools/perf/util/bpf_lock_contention.c | 12 +-
tools/perf/util/bpf_off_cpu.c | 44 +-
tools/perf/util/bpf_trace_augment.c | 8 +-
tools/perf/util/cgroup.c | 26 +-
tools/perf/util/cs-etm.c | 5 +-
tools/perf/util/data-convert-bt.c | 2 +-
tools/perf/util/data.c | 27 +-
tools/perf/util/data.h | 4 +-
tools/perf/util/evlist.c | 496 ++--
tools/perf/util/evlist.h | 273 +-
tools/perf/util/evsel.c | 39 +-
tools/perf/util/evsel.h | 40 +-
tools/perf/util/expr.c | 2 +-
tools/perf/util/header.c | 69 +-
tools/perf/util/header.h | 2 +-
tools/perf/util/intel-pt.c | 8 +-
tools/perf/util/intel-tpebs.c | 7 +-
tools/perf/util/iostat.c | 2 +-
tools/perf/util/iostat.h | 2 +-
tools/perf/util/map.h | 9 +-
tools/perf/util/metricgroup.c | 12 +-
tools/perf/util/parse-events.c | 10 +-
tools/perf/util/parse-events.y | 2 +-
tools/perf/util/perf_api_probe.c | 20 +-
tools/perf/util/pfm.c | 4 +-
tools/perf/util/print-events.c | 2 +-
tools/perf/util/python.c | 2846 ++++++++++++++++---
tools/perf/util/record.c | 11 +-
tools/perf/util/s390-sample-raw.c | 20 +-
tools/perf/util/sample-raw.c | 4 +-
tools/perf/util/sample.c | 17 +-
tools/perf/util/session.c | 69 +-
tools/perf/util/session.h | 2 +
tools/perf/util/setup.py | 5 +
tools/perf/util/sideband_evlist.c | 40 +-
tools/perf/util/sort.c | 2 +-
tools/perf/util/stat-display.c | 6 +-
tools/perf/util/stat-shadow.c | 24 +-
tools/perf/util/stat.c | 20 +-
tools/perf/util/stream.c | 4 +-
tools/perf/util/synthetic-events.c | 11 +-
tools/perf/util/time-utils.c | 12 +-
tools/perf/util/top.c | 4 +-
114 files changed, 4602 insertions(+), 1529 deletions(-)
create mode 100644 tools/perf/python/perf.pyi
create mode 100755 tools/perf/python/perf_live.py

--
2.54.0.1136.gdb2ca164c4-goog