Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION

From: Sedat Dilek
Date: Fri Feb 26 2021 - 04:07:05 EST


On Fri, Feb 26, 2021 at 9:14 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>
> On Fri, Feb 26, 2021 at 1:36 AM Sedat Dilek <sedat.dilek@xxxxxxxxx> wrote:
> >
> > On Thu, Feb 25, 2021 at 12:21 PM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
> > >
> > > From: Arnd Bergmann <arnd@xxxxxxxx>
> > >
> > > When looking at kernel size optimizations, I found that arm64
> > > does not currently support HAVE_LD_DEAD_CODE_DATA_ELIMINATION,
> > > which enables the --gc-sections flag to the linker.
> > >
> > > I see that for a defconfig build with llvm, there are some
> > > notable improvements from enabling this, in particular when
> > > combined with the recently added CONFIG_LTO_CLANG_THIN
> > > and CONFIG_TRIM_UNUSED_KSYMS:
> > >
> > > text data bss dec hex filename
> > > 16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
> > > 16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
> > > 16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
> > > 16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
> > > 17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
> > > 16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
> > > 16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
> > > 15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux
> > >
> >
> > Thanks for the numbers.
> > Does CONFIG_TRIM_UNUSED_KSYMS=y have an impact to the build-time (and
> > disc-usage - negative way means longer/bigger)?
> > Do you have any build-time for the above numbers?
>
> They are in the mailing list archive I linked to:
>
> ==== defconfig ====
> 332.001786355 seconds time elapsed
> 8599.464163000 seconds user
> 676.919635000 seconds sys
> ==== trim_defconfig ====
> 448.378576012 seconds time elapsed
> 10735.489271000 seconds user
> 964.006504000 seconds sys
> ==== gc_defconfig ====
> 324.347492236 seconds time elapsed
> 8465.785800000 seconds user
> 614.899797000 seconds sys
> ==== gc+trim_defconfig ====
> 429.188875620 seconds time elapsed
> 10203.759658000 seconds user
> 871.307973000 seconds sys
> ==== thinlto_defconfig ====
> 389.793540200 seconds time elapsed
> 9491.665320000 seconds user
> 664.858109000 seconds sys
> ==== thinlto+trim_defconfig ====
> 580.431820561 seconds time elapsed
> 11429.515538000 seconds user
> 1056.985745000 seconds sys
> ==== gc+thinlto_defconfig ====
> 389.484364525 seconds time elapsed
> 9473.831980000 seconds user
> 675.057675000 seconds sys
> ==== gc+thinlto+trim_defconfig ====
> 580.824912807 seconds time elapsed
> 11433.650337000 seconds user
> 1049.845569000 seconds sys
>

Thanks for the numbers Arnd.

> So HAVE_LD_DEAD_CODE_DATA_ELIMINATION is a small improvement
> on build time (since it can spend less time linking), while
> CONFIG_TRIM_UNUSED_KSYMS slows it down quite a bit. Combining
> CONFIG_TRIM_UNUSED_KSYMS with CONFIG_THINLTO is really
> slow because here most of the time is spent in the final link (especially
> when you have many CPU cores to do the earlier bits quickly), but then
> it does the link twice.
>

My first pre-v5.12-rc1 kernel-build was with Clang-ThinLTO enabled.
But with the next ones I jumped to Sami's Clang-CFI.

> > BTW, is CONFIG_LD_DEAD_CODE_DATA_ELIMINATION=y setable for x86 (64bit)?
> > ( Did not look or check for it. )
>
> No, in mainline, HAVE_LD_DEAD_CODE_DATA_ELIMINATION is currently
> only selected on MIPS and PowerPC. I only sent experimental patches to
> enable it on arm64 and m68k, but have not tried booting them. If you
> select the symbol on x86, you should see similar results.
>

OK, i see:

$ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/mips/
arch/mips/Kconfig: select HAVE_LD_DEAD_CODE_DATA_ELIMINATION

$ git grep HAVE_LD_DEAD_CODE_DATA_ELIMINATION arch/powerpc/
arch/powerpc/Kconfig: select HAVE_LD_DEAD_CODE_DATA_ELIMINATION

So, I need to add this to arch/x86/Kconfig.

You happen to know if changes to arch/x86/kernel/vmlinux.lds.S
(sections) are needed?

Last question:
The last days I see a lot of fixes touching inlining with LLVM/Clang v13-git.
What git tag are you using?
What are your experiences?
Pending patches (kernel-side)?

I use:
$ /opt/llvm-toolchain/bin/clang --version
dileks clang version 13.0.0 (https://github.com/llvm/llvm-project.git
c465429f286f50e52a8d2b3b39f38344f3381cce)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/llvm-toolchain/bin

My LLVM toolchain is ThinLTO+PGO optimized for Linux-kernel builds.

- Sedat -