[PATCH v1 00/14] perf build: Reduce build time by one third
From: Ian Rogers
Date: Tue May 12 2026 - 01:49:51 EST
This patch series refactors many aspects of the perf build aiming to
better encapsulate BPF code generation, remove serial build code and
gain build parallelism. The prepare step that blocks the parallel
build is reduced to a core 6 smaller dependencies. BPF skeletons are
made regular dependencies on the targets that use them. Feature tests
and dependencies are reorgnized. The jevents.py script processes json
files in parallel and allows the big_c_string to be compiled
separately.
On a 28-core build workstation (make -j28 all from scratch), clean build
latency improves by over 36%:
Before:
real 0m29.006s
user 2m46.019s
sys 0m30.610s
After:
real 0m18.498s
user 2m32.922s
sys 0m27.623s
Summary of Patches:
1: bpftool Bootstrap Optimization
- Exempts bpftool bootstrap from non-essential feature tests (LLVM, libbfd,
libcap), saving 1.1s of sub-make fork overhead during Kbuild startup.
2-4: Flattening Umbrella Prepare Barriers
- builtin-trace embedded inclusions and pmu-events generation are completely
decoupled from the sequential "prepare" umbrella target, eliminating Make
AST double-parsing overhead and unchoking parallel compilation barriers.
5-8: Decoupling & Pre-generating BPF Skeletons
- BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
- Decouples bpftool bootstrap from top-level static libbpf dependencies,
attaching bpf-skel-prepare directly to the umbrella prepare target. This
allows Make to pre-compile bpftool and dump vmlinux.h in the background at
build startup, removing the 7-second serialization bottleneck before BPF
object compilation.
9-11: Foundational Linkage & Fast-Path Feature Detection
- Eliminates redundant libbpf sub-make feature checks during static builds.
- Integrates libdebuginfod directly into test-all.c, allowing Make to skip
individual feature check sub-make forks during AST parsing on fully
configured workstations.
12-13: jevents.py Concurrency & Deduplication
- Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
dedicated pmu-events-string.c compilation unit. This slices C compilation
latency in half by compiling string and struct tables simultaneously across
separate CPU cores while preserving zero dynamic ELF relocations.
- Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
all available CPU cores using ProcessPoolExecutor (accelerating Python
execution by 11x, from 3.3s down to ~290ms).
14: Out-of-Tree Incremental Rebuild Fix
- Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
Make from continuously re-executing script installation rules on already
built out-of-tree builds.
Ian Rogers (14):
bpftool build: Restrict feature tests during bootstrap compilation
perf trace beauty: Make beauty generated C code standalone .o files
perf build: Decouple pmu-events from prepare umbrella target
perf build: Remove empty archheaders target
perf build: Move BPF skeleton generation out of Makefile.perf
perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
perf build: Move static libbpf dependency out of prepare step
perf build: Pre-generate BPF skeletons during umbrella prepare phase
perf build: Move libsymbol dependency out of prepare step
perf build: Remove redundant libbpf feature check for static builds
tools build: Integrate libdebuginfod into test-all fast path
perf pmu-events: Split big_c_string storage into standalone
compilation unit
perf pmu-events: Parallelize JSON and metric pre-computation in
jevents.py
perf build: Prefix SCRIPTS with output directory to fix continuous
rebuilds
tools/bpf/bpftool/Makefile | 5 +
tools/build/Makefile.feature | 6 +-
tools/build/feature/Makefile | 2 +-
tools/build/feature/test-all.c | 5 +
tools/perf/Build | 2 +
tools/perf/Makefile.config | 6 +-
tools/perf/Makefile.perf | 427 +-----------------
tools/perf/bench/Build | 6 +
.../bpf_skel/bench_uprobe.bpf.c | 0
tools/perf/bench/uprobe.c | 2 +-
tools/perf/bpf_skel.mak | 110 +++++
tools/perf/builtin-trace.c | 30 +-
tools/perf/pmu-events/Build | 15 +-
tools/perf/pmu-events/jevents.py | 56 ++-
tools/perf/trace/beauty/Build | 280 ++++++++++++
tools/perf/trace/beauty/arch_errno_names.c | 2 +
tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
tools/perf/trace/beauty/beauty.h | 60 +++
tools/perf/trace/beauty/eventfd.c | 6 +-
tools/perf/trace/beauty/fsconfig.c | 5 +
tools/perf/trace/beauty/futex_op.c | 6 +-
tools/perf/trace/beauty/futex_val3.c | 6 +-
tools/perf/trace/beauty/mmap.c | 24 +-
tools/perf/trace/beauty/mode_t.c | 6 +-
tools/perf/trace/beauty/msg_flags.c | 8 +-
tools/perf/trace/beauty/open_flags.c | 1 +
tools/perf/trace/beauty/perf_event_open.c | 22 +-
tools/perf/trace/beauty/pid.c | 5 +-
tools/perf/trace/beauty/sched_policy.c | 8 +-
tools/perf/trace/beauty/seccomp.c | 12 +-
tools/perf/trace/beauty/signum.c | 6 +-
tools/perf/trace/beauty/socket_type.c | 6 +-
.../perf/{util => trace/beauty}/syscalltbl.c | 0
.../perf/{util => trace/beauty}/syscalltbl.h | 0
tools/perf/trace/beauty/tracepoints/Build | 22 +
tools/perf/trace/beauty/waitid_options.c | 8 +-
tools/perf/util/Build | 17 +-
tools/perf/util/bpf-trace-summary.c | 2 +-
tools/perf/util/env.c | 4 +-
tools/perf/util/env.h | 1 +
40 files changed, 700 insertions(+), 491 deletions(-)
rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
create mode 100644 tools/perf/bpf_skel.mak
create mode 100644 tools/perf/trace/beauty/fsconfig.c
rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
--
2.54.0.563.g4f69b47b94-goog