Re: [PATCH 01/13] objtool: Rewrite hashtable sizing

From: Nathan Chancellor
Date: Thu Jun 10 2021 - 14:14:54 EST


Hi Peter,

On Thu, May 06, 2021 at 09:33:53PM +0200, Peter Zijlstra wrote:
> Currently objtool has 5 hashtables and sizes them 16 or 20 bits
> depending on the --vmlinux argument.
>
> However, a single side doesn't really work well for the 5 tables,
> which among them, cover 3 different uses. Also, while vmlinux is
> larger, there is still a very wide difference between a defconfig and
> allyesconfig build, which again isn't optimally covered by a single
> size.
>
> Another aspect is the cost of elf_hash_init(), which for large tables
> dominates the runtime for small input files. It turns out that all it
> does it assign NULL, something that is required when using malloc().
> However, when we allocate memory using mmap(), we're guaranteed to get
> zero filled pages.
>
> Therefore, rewrite the whole thing to:
>
> 1) use more dynamic sized tables, depending on the input file,
> 2) avoid the need for elf_hash_init() entirely by using mmap().
>
> This speeds up a regular kernel build (100s to 98s for
> x86_64-defconfig), and potentially dramatically speeds up vmlinux
> processing.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>

This patch as commit 25cf0d8aa2a3 ("objtool: Rewrite hashtable sizing")
in -tip causes a massive compile time regression with allmodconfig +
ThinLTO.

At v5.13-rc1, the performance penalty is only about 23%, as measured with
hyperfine for two runs [1]:

Benchmark #1: allmodconfig
Time (mean ± σ): 625.173 s ± 2.198 s [User: 35120.895 s, System: 2176.868 s]
Range (min … max): 623.619 s … 626.727 s 2 runs

Benchmark #2: allmodconfig with ThinLTO
Time (mean ± σ): 771.034 s ± 0.369 s [User: 39706.084 s, System: 2326.166 s]
Range (min … max): 770.773 s … 771.295 s 2 runs

Summary
'allmodconfig' ran
1.23 ± 0.00 times faster than 'allmodconfig with ThinLTO'

However, at 25cf0d8aa2a3, it is almost 150% on a 64-core server.

Benchmark #1: allmodconfig
Time (mean ± σ): 624.759 s ± 2.153 s [User: 35114.379 s, System: 2145.456 s]
Range (min … max): 623.237 s … 626.281 s 2 runs

Benchmark #2: allmodconfig with ThinLTO
Time (mean ± σ): 1555.377 s ± 12.806 s [User: 40558.463 s, System: 2310.139 s]
Range (min … max): 1546.321 s … 1564.432 s 2 runs

Summary
'allmodconfig' ran
2.49 ± 0.02 times faster than 'allmodconfig with ThinLTO'

Adding Sami because I am not sure why this patch would have much of an impact
in relation to LTO. https://git.kernel.org/tip/25cf0d8aa2a3 is the patch in
question.

If I can provide any further information or help debug, please let me know.

If you are interested in reproducing this locally, you will need a
fairly recent LLVM stack (I used the stable release/12.x branch) and to
cherry-pick commit 976aac5f8829 ("kcsan: Fix debugfs initcall return
type") to fix an unrelated build failure. My script [2] can build a
self-contained toolchain fairly quickly if you cannot get one from your
package manager. A command like below will speed up the build a bit:

$ ./build-llvm.py \
--branch "release/12.x" \
--build-stage1-only \
--install-stage1-only \
--projects "clang;lld" \
--targets X86

After adding the "install/bin" directory to PATH:

$ echo "CONFIG_GCOV_KERNEL=n
CONFIG_KASAN=n
CONFIG_LTO_CLANG_THIN=y" >allmod.config

$ make -skj"$(nproc)" LLVM=1 LLVM_IAS=1 allmodconfig all

[1]: https://github.com/sharkdp/hyperfine
[2]: https://github.com/ClangBuiltLinux/tc-build

Cheers,
Nathan