Re: [PATCH v8 00/16] Add support for Clang LTO

From: Sami Tolvanen
Date: Tue Dec 08 2020 - 11:44:01 EST


On Tue, Dec 8, 2020 at 4:15 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>
> On Tue, Dec 1, 2020 at 10:37 PM 'Sami Tolvanen' via Clang Built Linux
> <clang-built-linux@xxxxxxxxxxxxxxxx> wrote:
> >
> > This patch series adds support for building the kernel with Clang's
> > Link Time Optimization (LTO). In addition to performance, the primary
> > motivation for LTO is to allow Clang's Control-Flow Integrity (CFI)
> > to be used in the kernel. Google has shipped millions of Pixel
> > devices running three major kernel versions with LTO+CFI since 2018.
> >
> > Most of the patches are build system changes for handling LLVM
> > bitcode, which Clang produces with LTO instead of ELF object files,
> > postponing ELF processing until a later stage, and ensuring initcall
> > ordering.
> >
> > Note that arm64 support depends on Will's memory ordering patches
> > [1]. I will post x86_64 patches separately after we have fixed the
> > remaining objtool warnings [2][3].
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/lto
> > [2] https://lore.kernel.org/lkml/20201120040424.a3wctajzft4ufoiw@treble/
> > [3] https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=objtool-vmlinux
> >
> > You can also pull this series from
> >
> > https://github.com/samitolvanen/linux.git lto-v8
>
> I've tried pull this into my randconfig test tree to give it a spin.

Great, thank you for testing this!

> So far I have
> not managed to get a working build out of it, the main problem so far being
> that it is really slow to build because the link stage only uses one CPU.
> These are the other issues I've seen so far:

You may want to limit your testing only to ThinLTO at first, because
full LTO is going to be extremely slow with larger configs, especially
when building arm64 kernels.

> - one build seems to take even longer to link. It's currently at 35GB RAM
> usage and 40 minutes into the final link, but I'm worried it might
> not complete
> before it runs out of memory. I only have 128GB installed, and google-chrome
> uses another 30GB of that, and I'm also doing some other builds in parallel.
> Is there a minimum recommended amount of memory for doing LTO builds?

When building arm64 defconfig, the maximum memory usage I measured
with ThinLTO was 3.5 GB, and with full LTO 20.3 GB. I haven't measured
larger configurations, but I believe LLD can easily consume 3-4x that
much with full LTO allyesconfig.

> - One build failed with
> ld.lld -EL -maarch64elf -mllvm -import-instr-limit=5 -r -o vmlinux.o
> -T .tmp_initcalls.lds --whole-archive arch/arm64/kernel/head.o
> init/built-in.a usr/built-in.a arch/arm64/built-in.a kernel/built-in.a
> certs/built-in.a mm/built-in.a fs/built-in.a ipc/built-in.a
> security/built-in.a crypto/built-in.a block/built-in.a
> arch/arm64/lib/built-in.a lib/built-in.a drivers/built-in.a
> sound/built-in.a net/built-in.a virt/built-in.a --no-whole-archive
> --start-group arch/arm64/lib/lib.a lib/lib.a
> ./drivers/firmware/efi/libstub/lib.a --end-group
> "ld.lld: error: arch/arm64/kernel/head.o: invalid symbol index"
> after about 30 minutes

That's interesting. Did you use LLVM_IAS=1?

> - CONFIG_CPU_BIG_ENDIAN doesn't seem to work with lld, and LTO
> doesn't work with ld.bfd.
> I've added a CPU_LITTLE_ENDIAN dependency to
> ARCH_SUPPORTS_LTO_CLANG{,THIN}

Ah, good point. I'll fix this in v9.

[...]
> Not sure if these are all known issues. If there is one you'd like me try
> take a closer look at for finding which config options break it, I can try

No, none of these are known issues. I would be happy to take a closer
look if you can share configs that reproduce these.

Sami