Re: [GIT PULL v2] clang-lto for v5.12-rc1

From: Arnd Bergmann
Date: Wed Feb 24 2021 - 11:10:31 EST


On Wed, Feb 24, 2021 at 1:10 AM Alexander Lobakin <alobakin@xxxxx> wrote:
> From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Date: Tue, 23 Feb 2021 12:33:05 -0800
>> > On Tue, Feb 23, 2021 at 9:49 AM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > > On Mon, Feb 22, 2021 at 3:11 PM Kees Cook <keescook@xxxxxxxxxxxx> wrote:
> > > >
> > > > While x86 LTO enablement is done[1], it depends on some objtool
> > > > clean-ups[2], though it appears those actually have been in linux-next
> > > > (via tip/objtool/core), so it's possible that if that tree lands [..]
> > >
> > > That tree is actually next on my list of things to merge after this
> > > one, so it should be out soonish.
> >
> > "soonish" turned out to be later than I thought, because my "build
> > changes" set of pulls included the module change that I then wasted a
> > lot of time on trying to figure out why it slowed down my build so
> > much.
>
> I guess it's about CONFIG_TRIM_UNUSED_KSYMS you disabled in your tree.
> Well, it's actually widely used, mostly in the embedded world where
> there are often no out-of-tree modules, but a need to save as much
> space as possible.
> For full-blown systems and distributions it's almost needless, right.

Generally, CONFIG_TRIM_UNUSED_KSYMS helps mostly
when combined with either LTO or --gc-sections
(CONFIG_HAVE_LD_DEAD_CODE_DATA_ELIMINATION), though
the effect seems to be smaller than I expected. For example on m68k:

4005135 1374302 167108 5546545 54a231 vmlinux-normal
3916254 1378078 167108 5461440 5355c0 vmlinux+trim
4012933 1362514 164280 5539727 54878f vmlinux+gcsection
3797884 1334194 164640 5296718 50d24e vmlinux+gcsection+trim

For arm64 defconfig, CONFIG_TRIM_UNUSED_KSYMS saves around
700KB by itself, or when combined with either gc-sections or LTO,
but saves a full megabyte when all three are combined:

text data bss dec hex filename
16570322 10998617 506468 28075407 1ac658f defconfig/vmlinux
16318793 10569913 506468 27395174 1a20466 trim_defconfig/vmlinux
16281234 10984848 504291 27770373 1a7be05 gc_defconfig/vmlinux
16029705 10556880 504355 27090940 19d5ffc gc+trim_defconfig/vmlinux
17040142 11102945 504196 28647283 1b51f73 thinlto_defconfig/vmlinux
16788613 10663201 504196 27956010 1aa932a thinlto+trim_defconfig/vmlinux
16347062 11043384 502499 27892945 1a99cd1 gc+thinlto_defconfig/vmlinux
15759453 10532792 502395 26794640 198da90 gc+thinlto+trim_defconfig/vmlinux

However, the combination of thinlto and trim indeed has a steep
cost in compile time, taking almost twice as long as a normal
defconfig (gc-sections makes it slightly faster).

==== defconfig ====
332.001786355 seconds time elapsed
8599.464163000 seconds user
676.919635000 seconds sys
==== trim_defconfig ====
448.378576012 seconds time elapsed
10735.489271000 seconds user
964.006504000 seconds sys
==== gc_defconfig ====
324.347492236 seconds time elapsed
8465.785800000 seconds user
614.899797000 seconds sys
==== gc+trim_defconfig ====
429.188875620 seconds time elapsed
10203.759658000 seconds user
871.307973000 seconds sys
==== thinlto_defconfig ====
389.793540200 seconds time elapsed
9491.665320000 seconds user
664.858109000 seconds sys
==== thinlto+trim_defconfig ====
580.431820561 seconds time elapsed
11429.515538000 seconds user
1056.985745000 seconds sys
==== gc+thinlto_defconfig ====
389.484364525 seconds time elapsed
9473.831980000 seconds user
675.057675000 seconds sys
==== gc+thinlto+trim_defconfig ====
580.824912807 seconds time elapsed
11433.650337000 seconds user
1049.845569000 seconds sys

Arnd