Re: [PATCH v3 00/17] perf build: Reduce build time by nearly half

From: Namhyung Kim

Date: Thu May 14 2026 - 18:07:27 EST


On Thu, May 14, 2026 at 09:33:52AM -0700, Ian Rogers wrote:
> This patch series refactors Kbuild internals, BPF skeleton generation,
> Python AST pre-computation, and foundational tooling dependencies across
> the perf tool build system. By eliminating umbrella target synchronization
> barriers, decoupling static library prerequisites, parallelizing single-core
> script generators, and eradicating redundant feature checks, this series
> unlocks absolute theoretical peak multi-core concurrency during Kbuild startup.
>
> On a 28-core build workstation (make -j28 all from scratch), clean build
> latency improves by over 49%:
>
> Before:
> real 0m29.006s
> user 2m46.019s
> sys 0m30.610s
>
> After:
> real 0m14.782s
> user 2m39.527s
> sys 0m22.938s
>
> Saving 14.2 full seconds time per clean build. Furthermore, nothing to
> build incremental builds are improved by nearly 7x:
>
> Before:
> real 0m11.528s
> user 0m9.633s
> sys 0m6.965s
>
> After:
> real 0m1.729s
> user 0m1.600s
> sys 0m0.884s

I've quickly checked it with latency profiling like below:

$ perf record --latency -- make -C tools/perf

$ perf report --latency -s comm

The result looks like this.

Before:
#
# Samples: 715K of event 'cpu/cycles/Pu'
# Event count (approx.): 422452811481
#
# Latency Overhead Command
# ........ ........ ...............
#
45.28% 71.33% cc1
34.48% 16.92% python3
11.15% 2.21% ld
2.58% 1.51% x86_64-linux-gn
2.22% 0.99% cc1plus
0.71% 0.63% sh
0.69% 0.14% llvm-config
0.62% 0.56% clang
0.57% 4.40% shellcheck
0.44% 0.12% perl

After:
#
# Samples: 709K of event 'cpu/cycles/Pu'
# Event count (approx.): 416654798495
#
# Latency Overhead Command
# ........ ........ ...............
#
64.99% 71.16% cc1
15.07% 1.81% ld
7.14% 17.59% python3
3.66% 1.53% x86_64-linux-gn
3.48% 0.75% cc1plus
1.11% 4.43% shellcheck
1.09% 0.74% sh
0.86% 0.59% clang
0.77% 0.12% perl
0.45% 0.23% make

Now I see a big drop in the latency from python. And the llvm-config
doesn't show up in the top 10.

Thanks,
Namhyung

>
> Summary of Patches:
>
> 1-3: Foundational Tooling & Fast-Path Feature Detection
> - Exempts bpftool bootstrap from non-essential feature tests (LLVM, libbfd,
> libcap), saving 1.1s of sub-make fork overhead during Kbuild startup.
> - Integrates libdebuginfod directly into test-all.c, allowing Make to skip
> individual feature check sub-make forks during AST parsing on fully
> configured workstations. Escapes $(shell ...) macro expansion to prevent
> unconditional sub-make forks.
> - Fixes test-clang-bpf-co-re.bin feature check to correctly generate its
> target file on disk via atomic move (> $@.tmp && mv $@.tmp $@), allowing
> Kbuild to perfectly cache the detection result and avoid continuous sub-make
> re-evaluations.
>
> 4-6: Flattening Umbrella Prepare Barriers
> - builtin-trace embedded inclusions and pmu-events generation are completely
> decoupled from the sequential "prepare" umbrella target, eliminating Make
> AST double-parsing overhead and unchoking parallel compilation barriers.
>
> 7-10: Decoupling & Pre-generating BPF Skeletons
> - BPF skeleton rules are extracted out of Makefile.perf into bpf_skel.mak.
> - Decouples bpftool bootstrap from top-level static libbpf dependencies,
> attaching bpf-skel-prepare directly to the umbrella prepare target. This
> allows Make to pre-compile bpftool and dump vmlinux.h in the background at
> build startup, removing the 7-second serialization bottleneck before BPF
> object compilation.
> - Ensures benchmark skeleton intermediate .bpf.o files are cleanly removed
> during make clean, and adds bpf-skel-prepare to .PHONY.
>
> 11-12: Foundational Linkage Optimization
> - Eliminates redundant libbpf sub-make feature checks during static builds.
> - Moves static libsymbol and libbpf library prerequisites out of the
> prepare step, ensuring libbpf headers are installed before
> compiling BPF-dependent tests.
>
> 13-14: jevents.py Concurrency & Deduplication
> - Splits the massive 2.8 MB big_c_string literal out of pmu-events.c
> into a dedicated pmu-events-string.c compilation unit. This slices
> C compilation latency in half by compiling string and struct
> tables simultaneously across separate CPU cores while preserving
> zero dynamic ELF relocations. Adds pmu-events-string.c to
> .gitignore and uses Make 4.0 compatible dependency chaining.
> - Pre-populates jevents.py JSON ASTs and metric formulas in parallel across
> all available CPU cores using ProcessPoolExecutor (accelerating Python
> execution by 11x, from 3.3s down to ~290ms). Moves _init_worker to top-level
> scope to ensure clean pickling under spawn multiprocessing start methods.
>
> 15: Out-of-Tree Incremental Rebuild Fix
> - Prefixes SCRIPTS (perf-archive, perf-iostat) with $(OUTPUT) to prevent
> Make from continuously re-executing script installation rules on already
> built out-of-tree builds.
>
> 16-17: AST Parsing Optimization & Shell Fork Eradication
> - Converts ZENS, ARMS, and INTELS in pmu-events/Build from recursive
> assignment (=) to simply expanded assignment (:=) and replaces
> model_name/vendor_name with pure GNU Make string functions. This
> guarantees Make executes directory probing shell forks exactly
> once during AST parsing and evaluates path macros purely in
> memory, completely eradicating over 7,800 redundant sub-processes
> during out-of-tree build evaluation.
> - Converts llvm-config shell queries in Makefile.config from
> recursive assignment (=) to simply expanded assignment (:=). This
> eliminates ~185 redundant sub-processes that were previously
> executed across object compilation dependency checks.
>
> Changes since v2:
> - Dropped Patch 4 (tools scripts: Short-circuit CC_NO_CLANG compiler
> probe in Makefile.include) to prevent potential cross-compilation
> regressions when CC and HOSTCC use different compilers.
> - tools build (Patch 2): Escaped $(shell ...) macro expansion as
> $$(shell ...) inside define feature_check_code to safely defer
> sub-make execution until after eval parses the ifeq guard.
> - tools build (Patch 3): Refactored test-clang-bpf-co-re.bin feature
> check recipe to redirect grep output to a temporary file and
> atomically move it upon success (> $@.tmp && mv $@.tmp $@),
> preventing Kbuild from permanently caching failed detections due to
> 0-byte files.
> - perf trace beauty (Patch 4): Updated commit description to accurately
> reflect the unconditional top-level recursive kbuild hook
> (perf-util-y += trace/beauty/).
> - perf build (Patch 7): Added $(OUTPUT)bench/bpf_skel/.tmp to
> bpf-skel-clean in Makefile.perf to ensure intermediate benchmark
> skeleton .bpf.o artifacts are cleanly removed during make clean.
> Removed unused bpf_skel_deps variable from bpf_skel.mak.
> - perf build (Patch 9): Added $(LIBBPF) as an explicit prerequisite to
> $(LIBPERF_TEST_IN) in Makefile.perf to guarantee libbpf headers are
> fully installed before compiling sigtrap.c or other BPF-dependent
> tests during parallel builds.
> - perf build (Patch 10): Added bpf-skel-prepare to the .PHONY target
> list in Makefile.perf to ensure Make never incorrectly skips the
> target if a file or directory named bpf-skel-prepare accidentally
> exists in the build tree.
> - perf pmu-events (Patch 13): Added pmu-events/pmu-events-string.c to
> tools/perf/.gitignore. Replaced grouped targets (&:) with Make 4.0
> compatible dependency chaining to guarantee backward compatibility
> with older Make versions (like 4.2.1) and prevent parallel builds
> from spawning multiple concurrent jevents.py processes.
> - perf pmu-events (Patch 14): Moved _init_worker from local main()
> scope to the top-level module scope in jevents.py to ensure it can be
> cleanly pickled when ProcessPoolExecutor uses the spawn
> multiprocessing start method (avoiding AttributeError crashes).
>
> Ian Rogers (17):
> bpftool build: Restrict feature tests during bootstrap compilation
> tools build: Integrate libdebuginfod into test-all fast path
> tools build: Fix test-clang-bpf-co-re.bin to generate target file
> perf trace beauty: Make beauty generated C code standalone .o files
> perf build: Decouple pmu-events from prepare umbrella target
> perf build: Remove empty archheaders target
> perf build: Move BPF skeleton generation out of Makefile.perf
> perf build: Encapsulate vmlinux.h and bpftool in bpf_skel.mak
> perf build: Move static libbpf dependency out of prepare step
> perf build: Pre-generate BPF skeleton tooling during umbrella prepare
> phase
> perf build: Move libsymbol dependency out of prepare step
> perf build: Remove redundant libbpf feature check for static builds
> perf pmu-events: Split big_c_string storage into standalone
> compilation unit
> perf pmu-events: Parallelize JSON and metric pre-computation in
> jevents.py
> perf build: Prefix SCRIPTS with output directory to fix continuous
> rebuilds
> perf pmu-events: Convert recursive shell assignments and macros to
> Make built-ins
> perf build: Convert llvm-config shell queries to simply expanded
> variables
>
> tools/bpf/bpftool/Makefile | 5 +
> tools/build/Makefile.feature | 6 +-
> tools/build/feature/Makefile | 4 +-
> tools/build/feature/test-all.c | 5 +
> tools/perf/.gitignore | 1 +
> tools/perf/Build | 2 +
> tools/perf/Makefile.config | 19 +-
> tools/perf/Makefile.perf | 431 ++----------------
> tools/perf/bench/Build | 6 +
> .../bpf_skel/bench_uprobe.bpf.c | 0
> tools/perf/bench/uprobe.c | 2 +-
> tools/perf/bpf_skel.mak | 109 +++++
> tools/perf/builtin-trace.c | 30 +-
> tools/perf/pmu-events/Build | 26 +-
> tools/perf/pmu-events/jevents.py | 56 ++-
> tools/perf/trace/beauty/Build | 280 ++++++++++++
> tools/perf/trace/beauty/arch_errno_names.c | 2 +
> tools/perf/trace/beauty/arch_errno_names.sh | 2 +-
> tools/perf/trace/beauty/beauty.h | 60 +++
> tools/perf/trace/beauty/eventfd.c | 6 +-
> tools/perf/trace/beauty/fsconfig.c | 5 +
> tools/perf/trace/beauty/futex_op.c | 6 +-
> tools/perf/trace/beauty/futex_val3.c | 6 +-
> tools/perf/trace/beauty/mmap.c | 24 +-
> tools/perf/trace/beauty/mode_t.c | 6 +-
> tools/perf/trace/beauty/msg_flags.c | 8 +-
> tools/perf/trace/beauty/open_flags.c | 1 +
> tools/perf/trace/beauty/perf_event_open.c | 22 +-
> tools/perf/trace/beauty/pid.c | 5 +-
> tools/perf/trace/beauty/sched_policy.c | 8 +-
> tools/perf/trace/beauty/seccomp.c | 12 +-
> tools/perf/trace/beauty/signum.c | 6 +-
> tools/perf/trace/beauty/socket_type.c | 6 +-
> .../perf/{util => trace/beauty}/syscalltbl.c | 0
> .../perf/{util => trace/beauty}/syscalltbl.h | 0
> tools/perf/trace/beauty/tracepoints/Build | 22 +
> tools/perf/trace/beauty/waitid_options.c | 8 +-
> tools/perf/util/Build | 17 +-
> tools/perf/util/bpf-trace-summary.c | 2 +-
> tools/perf/util/env.c | 4 +-
> tools/perf/util/env.h | 1 +
> 41 files changed, 717 insertions(+), 504 deletions(-)
> rename tools/perf/{util => bench}/bpf_skel/bench_uprobe.bpf.c (100%)
> create mode 100644 tools/perf/bpf_skel.mak
> create mode 100644 tools/perf/trace/beauty/fsconfig.c
> rename tools/perf/{util => trace/beauty}/syscalltbl.c (100%)
> rename tools/perf/{util => trace/beauty}/syscalltbl.h (100%)
>
> --
> 2.54.0.563.g4f69b47b94-goog
>