[ANNOUNCE] "Fast Kernel Headers" Tree -v2

From: Ingo Molnar
Date: Sat Jan 08 2022 - 11:26:54 EST



I'm pleased to announce -v2 of the "Fast Kernel Headers" tree, which is a
comprehensive rework of the Linux kernel's header hierarchy & header
dependencies, with the dual goals of:

- speeding up the kernel build (both absolute and incremental build times)

- decoupling subsystem type & API definitions from each other

The fast-headers tree consists of over 25 sub-trees internally, spanning
over 2,300 commits, which can be found at:

git://git.kernel.org/pub/scm/linux/kernel/git/mingo/tip.git master

# HEAD: 391ce485ced0 headers/deps: Introduce the CONFIG_FAST_HEADERS=y config option

Changes in -v2:

- Port to v5.16-rc8

- Clang/LLVM support (with the help of Nathan Chancellor):

On my 'reference distro config' the build speedup under Clang is around +88%
in elapsed time and +77% in CPU time used:

#
# v5.16-rc8
#
Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

18,490,451.51 msec cpu-clock # 54.740 CPUs utilized ( +- 0.04% )

337.788 +- 0.834 seconds time elapsed ( +- 0.25% )

#
# -fast-headers-v2
#
Performance counter stats for 'make -j96 vmlinux LLVM=1' (3 runs):

10,443,670.86 msec cpu-clock # 58.093 CPUs utilized ( +- 0.00% )

179.773 +- 0.829 seconds time elapsed ( +- 0.46% )

- Unify the duplicated 'struct task_struct_per_task' into a single definition,
which should address the definition ugliness reported by Greg Kroah-Hartman.

- Fix bugs reported by Nathan Chancellor:

- cacheline attribute definition bug
- build bug with GCC plugins
- fix off-tree build

- Header optimizations that speed up the RDMA (infiniband) subsystem build
by about +9% over -v1 and +41% over the vanilla kernel:

$ perf stat --repeat 3 -e instructions,cycles,cpu-clock --sync --pre "find . -name '*.o' | xargs rm" m-rdma >/dev/null
...

# v5.16-rc8:

643,570.38 msec cpu-clock # 52.253 CPUs utilized ( +- 0.06% )

12.316 +- 0.183 seconds time elapsed ( +- 1.49% )

# -fast-headers-v1:
446,243.49 msec cpu-clock # 47.106 CPUs utilized ( +- 0.06% )

9.4731 +- 0.0666 seconds time elapsed ( +- 0.70% )

# -fast-headers-v2:
400,650.32 msec cpu-clock # 45.888 CPUs utilized ( +- 0.02% )

8.7310 +- 0.0162 seconds time elapsed ( +- 0.19% )

- Another round of <linux/sched.h> header footprint reductions: the
header is now used in only ~36% of .c files, down from 99% in the
mainline kernel and 68% in -v1.

- Various bisectability improvements & other fixes & optimizations.

Thanks,

Ingo