[PATCH v2 00/18] perf build: Reduce build time by nearly half

From: Ian Rogers

Date: Tue May 12 2026 - 13:53:29 EST


This patch series refactors Kbuild internals, BPF skeleton generation,
Python AST pre-computation, and foundational tooling dependencies across
the perf tool build system. By eliminating umbrella target synchronization
barriers, decoupling static library prerequisites, parallelizing single-core
script generators, and eradicating redundant feature checks, this series
unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.

On a 28-core build workstation (make -j28 all from scratch), clean build
latency improves by over 46%:

Before:
real 0m29.006s
user 2m46.019s
sys 0m30.610s

After:
real 0m15.655s
user 2m43.051s
sys 0m26.437s

Saving 13.3 full seconds time per clean build. Furthermore, nothing to
build incremental builds are improved by nearly 7x:

Before:
real 0m11.528s
user 0m9.633s
sys 0m6.965s

After:
real 0m1.665s
user 0m1.501s
sys 0m0.841s

Summary of Patches:

1-4: Foundational Tooling & Fast-Path Feature Detection
- Exempts bpftool bootstrap from non-essential feature tests (LLVM, libbfd,
libcap), saving 1.1s of sub-make fork overhead during Kbuild startup.
- Integrates libdebuginfod directly into test-all.c, allowing Make to skip
individual feature check sub-make forks during AST parsing on fully
configured workstations.
- Fixes test-clang-bpf-co-re.bin feature check to correctly generate its
target file on disk, allowing Kbuild to perfectly cache the detection result
and avoid continuous sub-make re-evaluations.
- Short-circuits CC_NO_CLANG compiler inspection probe in Makefile.include by
exporting the cached result, eliminating 40+ redundant compiler forks across
the sub-make hierarchy.

5-7: Flattening Umbrella Prepare Barriers
- builtin-trace embedded inclusions and pmu-events generation are completely
decoupled from the sequential "prepare" umbrella target, eliminating Make
AST double-parsing overhead and unchoking parallel compilation barriers.

8-11: Decoupling & Pre-generating BPF Skeletons
- BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
- Decouples bpftool bootstrap from top-level static libbpf dependencies,
attaching bpf-skel-prepare directly to the umbrella prepare target. This
allows Make to pre-compile bpftool and dump vmlinux.h in the background at
build startup, removing the 7-second serialization bottleneck before BPF
object compilation.

12-13: Foundational Linkage Optimization
- Eliminates redundant libbpf sub-make feature checks during static builds.
- Moves static libsymbol and libbpf library prerequisites out of the prepare step.

14-15: jevents.py Concurrency & Deduplication
- Splits the massive 2.8 MB big_c_string literal out of pmu-events.c into a
dedicated pmu-events-string.c compilation unit. This slices C compilation
latency in half by compiling string and struct tables simultaneously across
separate CPU cores while preserving zero dynamic ELF relocations.
- Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
all available CPU cores using ProcessPoolExecutor (accelerating Python
execution by 11x, from 3.3s down to ~290ms).

16: Out-of-Tree Incremental Rebuild Fix
- Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
Make from continuously re-executing script installation rules on already
built out-of-tree builds.

17-18: AST Parsing Optimization & Shell Fork Eradication
- Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive assignment
(=) to simply expanded assignment (:=) and replaces model_name/vendor_name
with pure GNU Make string functions. This guarantees Make executes directory
probing shell forks exactly once during AST parsing and evaluates path macros
purely in memory, completely eradicating over 7,800 redundant sub-processes
during out-of-tree build evaluation.
- Converts llvm-config shell queries in Makefile.config from recursive assignment
(=) to simply expanded assignment (:=). This eliminates ~185 redundant sub-processes
that were previously executed across object compilation dependency checks.

Changes since v1:
- Reorganized commit order so foundational build system and script infrastructure
patches precede perf tool refactoring.
- Added Tested-by tag from James Clark on v1 patches.
- Eliminated redundant llvm-config shell forks and simply expanded PMU directory
probing variables, wiping out over 7,800 redundant sub-processes during AST parsing.
- Fixed test-clang-bpf-co-re.bin feature check caching and short-circuited CC_NO_CLANG
compiler probes across sub-makes.

Ian Rogers (18):
bpftool build: Restrict feature tests during bootstrap compilation
tools build: Integrate libdebuginfod into test-all fast path
tools build: Fix test-clang-bpf-co-re.bin to generate target file
tools scripts: Short-circuit CC_NO_CLANG compiler probe in
Makefile.include
perf trace beauty: Make beauty generated C code standalone .o files
perf build: Decouple pmu-events from prepare umbrella target
perf build: Remove empty archheaders target
perf build: Move BPF skeleton generation out of Makefile.perf
perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
perf build: Move static libbpf dependency out of prepare step
perf build: Pre-generate BPF skeleton tooling during umbrella prepare
phase
perf build: Move libsymbol dependency out of prepare step
perf build: Remove redundant libbpf feature check for static builds
perf pmu-events: Split big_c_string storage into standalone
compilation unit
perf pmu-events: Parallelize JSON and metric pre-computation in
jevents.py
perf build: Prefix SCRIPTS with output directory to fix continuous
rebuilds
perf pmu-events: Convert recursive shell assignments and macros to
Make built-ins
perf build: Convert llvm-config shell queries to simply expanded
variables

tools/bpf/bpftool/Makefile | 5 +
tools/build/Makefile.feature | 6 +-
tools/build/feature/Makefile | 4 +-
tools/build/feature/test-all.c | 5 +
tools/perf/Build | 2 +
tools/perf/Makefile.config | 19 +-
tools/perf/Makefile.perf | 427 +-----------------
tools/perf/bench/Build | 6 +
.../bpf_skel/bench_uprobe.bpf.c | 0
tools/perf/bench/uprobe.c | 2 +-
tools/perf/bpf_skel.mak | 110 +++++
tools/perf/builtin-trace.c | 30 +-
tools/perf/pmu-events/Build | 25 +-
tools/perf/pmu-events/jevents.py | 56 ++-
tools/perf/trace/beauty/Build | 280 ++++++++++++
tools/perf/trace/beauty/arch_errno_names.c | 2 +
tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
tools/perf/trace/beauty/beauty.h | 60 +++
tools/perf/trace/beauty/eventfd.c | 6 +-
tools/perf/trace/beauty/fsconfig.c | 5 +
tools/perf/trace/beauty/futex_op.c | 6 +-
tools/perf/trace/beauty/futex_val3.c | 6 +-
tools/perf/trace/beauty/mmap.c | 24 +-
tools/perf/trace/beauty/mode_t.c | 6 +-
tools/perf/trace/beauty/msg_flags.c | 8 +-
tools/perf/trace/beauty/open_flags.c | 1 +
tools/perf/trace/beauty/perf_event_open.c | 22 +-
tools/perf/trace/beauty/pid.c | 5 +-
tools/perf/trace/beauty/sched_policy.c | 8 +-
tools/perf/trace/beauty/seccomp.c | 12 +-
tools/perf/trace/beauty/signum.c | 6 +-
tools/perf/trace/beauty/socket_type.c | 6 +-
.../perf/{util => trace/beauty}/syscalltbl.c | 0
.../perf/{util => trace/beauty}/syscalltbl.h | 0
tools/perf/trace/beauty/tracepoints/Build | 22 +
tools/perf/trace/beauty/waitid_options.c | 8 +-
tools/perf/util/Build | 17 +-
tools/perf/util/bpf-trace-summary.c | 2 +-
tools/perf/util/env.c | 4 +-
tools/perf/util/env.h | 1 +
tools/scripts/Makefile.include | 3 +
41 files changed, 716 insertions(+), 503 deletions(-)
rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
create mode 100644 tools/perf/bpf_skel.mak
create mode 100644 tools/perf/trace/beauty/fsconfig.c
rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)

--
2.54.0.563.g4f69b47b94-goog