Re: [PATCH] [RFC] arm64: enable HAVE_LD_DEAD_CODE_DATA_ELIMINATION
From: Nicholas Piggin
Date: Sun Feb 28 2021 - 20:18:53 EST
Excerpts from Arnd Bergmann's message of February 27, 2021 7:49 pm:
> On Fri, Feb 26, 2021 at 10:13 PM 'Fangrui Song' via Clang Built Linux
> <clang-built-linux@xxxxxxxxxxxxxxxx> wrote:
>>
>> For folks who are interested in --gc-sections on metadata sections,
>> I want to bring you awareness of the implication of __start_/__stop_ symbols and C identifier name sections.
>> You can see https://github.com/ClangBuiltLinux/linux/issues/1307 for a summary.
>> (Its linked blog article has some examples.)
>>
>> In the kernel linker scripts, most C identifier name sections begin with double-underscore __.
>> Some are surrounded by `KEEP(...)`, some are not.
>>
>> * A `KEEP` keyword has GC root semantics and makes ld --gc-sections ineffectful.
>> * Without `KEEP`, __start_/__stop_ references from a live input section
>> can unnecessarily retain all the associated C identifier name input
>> sections. The new ld.lld option `-z start-stop-gc` can defeat this rule.
>>
>> As an example, a __start___jump_table reference from a live section
>> causes all `__jump_table` input section to be retained, even if you
>> change `KEEP(__jump_table)` to `(__jump_table)`.
>> (If you change the symbol name from `__start_${section}` to something
>> else (e.g. `__start${section}`), the rule will not apply.)
>
> I suspect the __start_* symbols are cargo-culted by many developers
> copying stuff around between kernel linker scripts, that's certainly how I
> approach making changes to it normally without a deeper understanding
> of how the linker actually works or what the different bits of syntax mean
> there.
>
> I see the original vmlinux.lds linker script showed up in linux-2.1.23, and
> it contained
>
> + . = ALIGN(16); /* Exception table */
> + __start___ex_table = .;
> + __ex_table : { *(__ex_table) }
> + __stop___ex_table = .;
> +
> + __start___ksymtab = .; /* Kernel symbol table */
> + __ksymtab : { *(__ksymtab) }
> + __stop___ksymtab = .;
>
> originally for arch/sparc, and shortly afterwards for i386. The magic
> __ex_table section was first used in linux-2.1.7 without a linker
> script. It's probably a good idea to try cleaning these up by using
> non-magic start/stop symbols for all sections, and relying on KEEP()
> instead where needed.
>
>> There are a lot of KEEP usage. Perhaps some can be dropped to facilitate
>> ld --gc-sections.
>
> I see a lot of these were added by Nick Piggin (added to Cc) in this commit:
>
> commit 266ff2a8f51f02b429a987d87634697eb0d01d6a
> Author: Nicholas Piggin <npiggin@xxxxxxxxx>
> Date: Wed May 9 22:59:58 2018 +1000
>
> kbuild: Fix asm-generic/vmlinux.lds.h for LD_DEAD_CODE_DATA_ELIMINATION
>
> KEEP more tables, and add the function/data section wildcard to more
> section selections.
>
> This is a little ad-hoc at the moment, but kernel code should be moved
> to consistently use .text..x (note: double dots) for explicit sections
> and all references to it in the linker script can be made with
> TEXT_MAIN, and similarly for other sections.
>
> For now, let's see if major architectures move to enabling this option
> then we can do some refactoring passes. Otherwise if it remains unused
> or superseded by LTO, this may not be required.
>
> Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx>
> Signed-off-by: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx>
>
> which apparently was intentionally cautious.
>
> Unlike what Nick expected in his submission, I now think the annotations
> will be needed for LTO just like they are for --gc-sections.
Yeah I wasn't sure exactly what LTO looks like or how it would work.
I thought perhaps LTO might be able to find dead code with circular /
back references, we could put references from the code back to these
tables or something so they would be kept without KEEP. I don't know, I
was handwaving!
I managed to get powerpc (and IIRC x86?) working with gc sections with
those KEEP annotations, but effectiveness of course is far worse than
what Nicolas was able to achieve with all his techniques and tricks.
But yes unless there is some other mechanism to handle these tables,
then KEEP probably has to stay. I suggest this wants a very explicit and
systematic way to handle it (maybe with some toolchain support) rather
than trying to just remove things case by case and see what breaks.
I don't know if Nicolas is still been working on his shrinking patches
recenty but he probably knows more than anyone about this stuff.
Thanks,
Nick