Re: [PATCH v4 00/29] Add support for Clang LTO
From: Nick Desaulniers
Date: Wed Sep 30 2020 - 17:58:14 EST
On Tue, Sep 29, 2020 at 2:46 PM Sami Tolvanen <samitolvanen@xxxxxxxxxx> wrote:
>
> This patch series adds support for building x86_64 and arm64 kernels
> with Clang's Link Time Optimization (LTO).
>
> In addition to performance, the primary motivation for LTO is
> to allow Clang's Control-Flow Integrity (CFI) to be used in the
> kernel. Google has shipped millions of Pixel devices running three
> major kernel versions with LTO+CFI since 2018.
>
> Most of the patches are build system changes for handling LLVM
> bitcode, which Clang produces with LTO instead of ELF object files,
> postponing ELF processing until a later stage, and ensuring initcall
> ordering.
Sami, thanks for continuing to drive the series. I encourage you to
keep resending with fixes accumulated or dropped on a weekly cadence.
The series worked well for me on arm64, but for x86_64 on mainline I
saw a stream of new objtool warnings:
testing your LTO series; x86_64 defconfig + CONFIG_THINLTO:
``` LTO vmlinux.o OBJTOOL vmlinux.o vmlinux.o: warning: objtool:
wakeup_long64()+0x61: indirect jump found in RETPOLINE build
vmlinux.o: warning: objtool: .text+0x308a: indirect jump found in
RETPOLINE build vmlinux.o: warning: objtool: .text+0x30c5: indirect
jump found in RETPOLINE build vmlinux.o: warning: objtool:
copy_user_enhanced_fast_string() falls through to next function
copy_user_generic_unrolled() vmlinux.o: warning: objtool:
__memcpy_mcsafe() falls through to next function mcsafe_handle_tail()
vmlinux.o: warning: objtool: memset() falls through to next function
memset_erms() vmlinux.o: warning: objtool: __memcpy() falls through to
next function memcpy_erms() vmlinux.o: warning: objtool:
__x86_indirect_thunk_rax() falls through to next function
__x86_retpoline_rax() vmlinux.o: warning: objtool:
__x86_indirect_thunk_rbx() falls through to next function
__x86_retpoline_rbx() vmlinux.o: warning: objtool:
__x86_indirect_thunk_rcx() falls through to next function
__x86_retpoline_rcx() vmlinux.o: warning: objtool:
__x86_indirect_thunk_rdx() falls through to next function
__x86_retpoline_rdx() vmlinux.o: warning: objtool:
__x86_indirect_thunk_rsi() falls through to next function
__x86_retpoline_rsi() vmlinux.o: warning: objtool:
__x86_indirect_thunk_rdi() falls through to next function
__x86_retpoline_rdi() vmlinux.o: warning: objtool:
__x86_indirect_thunk_rbp() falls through to next function
__x86_retpoline_rbp() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r8() falls through to next function
__x86_retpoline_r8() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r9() falls through to next function
__x86_retpoline_r9() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r10() falls through to next function
__x86_retpoline_r10() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r11() falls through to next function
__x86_retpoline_r11() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r12() falls through to next function
__x86_retpoline_r12() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r13() falls through to next function
__x86_retpoline_r13() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r14() falls through to next function
__x86_retpoline_r14() vmlinux.o: warning: objtool:
__x86_indirect_thunk_r15() falls through to next function
__x86_retpoline_r15() ```
I think those should be resolved before I provide any kind of tested
by tag. My other piece of feedback was that I like the default
ThinLTO, but I think the help text in the Kconfig which is visible
during menuconfig could be improved by informing the user the
tradeoffs. For example, if CONFIG_THINLTO is disabled, it should be
noted that full LTO will be used instead. Also, that full LTO may
produce slightly better optimized binaries than ThinLTO, at the cost
of not utilizing multiple cores when linking and thus significantly
slower to link.
Maybe explaining that setting it to "n" implies a full LTO build,
which will be much slower to link but possibly slightly faster would
be good? It's not visible unless LTO_CLANG and ARCH_SUPPORTS_THINLTO
is enabled, so I don't think you need to explain that THINLTO without
those is *not* full LTO. I'll leave the precise wording to you. WDYT?
Also, when I look at your treewide DISABLE_LTO patch, I think "does
that need to be a part of this series, or is it a cleanup that can
stand on its own?" I think it may be the latter? Maybe it would help
shed one more patch than to have to carry it to just send it? Or did
I miss something as to why it should remain a part of this series?
--
Thanks,
~Nick Desaulniers