Re: powerpc allyesconfig / allmodconfig linux-next next-20160729 - next-20160729 build failures

From: Nicholas Piggin
Date: Thu Aug 04 2016 - 07:47:31 EST


On Thu, 04 Aug 2016 12:37:41 +0200
Arnd Bergmann <arnd@xxxxxxxx> wrote:

> On Thursday, August 4, 2016 11:00:49 AM CEST Arnd Bergmann wrote:
> > I tried this
> >
> > diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
> > index b5e40ed86e60..89bca1a25916 100755
> > --- a/scripts/link-vmlinux.sh
> > +++ b/scripts/link-vmlinux.sh
> > @@ -44,7 +44,7 @@ modpost_link()
> > local objects
> >
> > if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
> > - objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
> > + objects="${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
> > else
> > objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
> > fi
> >
> > but that did not seem to change anything, the extra symbols are
> > still there. I have not tried to understand what that actually
> > does, so maybe I misunderstood your suggestion.
> >
>
> On a second attempt, I did the same change for vmlinux instead of the
> module (d'oh), and got a link failure instead:
>
>
> arch/arm/mm/proc-xscale.o: In function `cpu_xscale_do_resume':
> (.text+0x3d4): undefined reference to `cpu_resume_mmu'
> arch/arm/kernel/setup.o: In function `setup_arch':
> setup.c:(.init.text+0x910): undefined reference to `init_uts_ns'
> kernel/nsproxy.o:(.data+0x4): undefined reference to `init_uts_ns'
> kernel/sched/core.o: In function `update_rq_clock':
> core.c:(.text+0x6d8): undefined reference to `paravirt_steal_rq_enabled'
> core.c:(.text+0x6dc): undefined reference to `pv_time_ops'
> kernel/sched/cputime.o: In function `account_process_tick':
> cputime.c:(.text+0x794): undefined reference to `paravirt_steal_enabled'
> cputime.c:(.text+0x7a0): undefined reference to `pv_time_ops'
> kernel/locking/lockdep.o: In function `save_trace':
> lockdep.c:(.text+0xfe8): undefined reference to `save_stack_trace'
> kernel/module.o: In function `load_module':
> module.c:(.text+0x1b54): undefined reference to `elf_check_arch'
> module.c:(.text+0x2024): undefined reference to `apply_relocate'
> kernel/debug/debug_core.o: In function `kgdb_unregister_io_module':
> debug_core.c:(.text+0x2e4): undefined reference to `kgdb_arch_exit'
> kernel/debug/debug_core.o: In function `kgdb_arch_set_breakpoint':
> debug_core.c:(.text+0x3bc): undefined reference to `arch_kgdb_ops'
> kernel/debug/debug_core.o: In function `dbg_remove_all_break':
> debug_core.c:(.text+0x6d0): undefined reference to `arch_kgdb_ops'
> ...
>
> However, I also see a link failure in some rare configurations
> with just your patch:
>
> arch/arm/lib/lib.a(io-acorn.o): In function `outsl':
> (.text+0x38): undefined reference to `printk'
>
> The problem being a file in a library object that is not referenced,
> but that references another symbol that is not defined
> (CONFIG_PRINTK=n).

The first problem is the existing link system is buggy. I think an
unconditional switch to --whole-archive (at least for modular kernels)
should probably be done anyway. For example, on powerpc when building
with --whole-archive, I have:

+dma_noop_alloc
+dma_noop_free
+dma_noop_map_page
+dma_noop_mapping_error
+dma_noop_map_sg
+dma_noop_ops
+dma_noop_supported
+fdt_add_reservemap_entry
+fdt_begin_node
+fdt_create
+fdt_create_empty_tree
+fdt_end_node
+fdt_errtable
+find_cpio_data
+ioremap_page_range

find_cpio_data is unnecessary and it's a codesize regression to link it.
But dma_noop_ops and ioremap_page_range are exported symbols. If I
reference dma_noop_ops from some random module with otherwise unpatched
kernel:

ERROR: "dma_noop_ops" [drivers/char/bsr.ko] undefined!

The real problem is that our linkage requirements are like a shared
library when we build modular.

We could build a list of exports and make it link objects with those
symbols, to solve this, but IMO that's just wasting lipstick on a pig.
But I will to propose a patch to always use --whole-archive, thin
archives or not, and transition all archs over to it in a few release
cycles. It just works by luck right now.

Why is it a pig? Because having the linker to notice no external
references and just skipping the .o completely is trying to use a hammer
as a scalpel. It's just not a very effective way to eliminate dead code
-- I pulled in only a handful of unneeded functions by switching it.

I mean it is a quick simple feature that probably works well enough with
simple build systems. But not an advanced one that builds almost
everything on demand and also has loadable modules and must act like a
shared library.

Real linker DCE is a valid optimisation that can't be replaced by the
build system of course, but we need to do it properly. Here's what I'm
working on.

It applies on top of the previous patch I sent, plus some powerpc stuff
I'm working on that you should be able to just ignore for another arch.
it's a WIP, but if you can see if it works for arm that would be cool.

It doesn't actually build allyesconfig after this,
ld: .tmp_vmlinux1: Too many sections: 220655 (>= 65280)

But on a more reasonable configuration (ppc64le)
text data bss dec filename
11191672 1183536 1923820 14299028 vmlinux
10625528 861895 1919707 13407130 vmlinux.thin+gc

10M-552K 1M-314K ~ 13M-870K

And it actually boots too, which is fairly astounding considering that
it lost half a meg of code and 1/3 of its data. I'm not completely sure
I've not done something wrong...

Thanks,
Nick

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index e75e17c..1594072 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -104,6 +104,10 @@ LDFLAGS_vmlinux := $(LDFLAGS_vmlinux-y)
LDFLAGS_vmlinux += --emit-relocs
KBUILD_LDFLAGS_MODULE += --emit-relocs

+KBUILD_CFLAGS += -ffunction-sections -fdata-sections
+LDFLAGS_vmlinux += --gc-sections
+
+
ifeq ($(CONFIG_PPC64),y)
ifeq ($(call cc-option-yn,-mcmodel=medium),y)
# -mcmodel=medium breaks modules because it uses 32bit offsets from
@@ -234,6 +238,8 @@ KBUILD_CFLAGS += $(cpu-as-y)
archscripts: scripts_basic
$(Q)$(MAKE) $(build)=arch/powerpc/tools

+CFLAGS_head_$(CONFIG_WORD_SIZE).o = -fno-function-sections
+
head-y := arch/powerpc/kernel/head_$(CONFIG_WORD_SIZE).o
head-$(CONFIG_8xx) := arch/powerpc/kernel/head_8xx.o
head-$(CONFIG_40x) := arch/powerpc/kernel/head_40x.o
@@ -245,6 +251,7 @@ head-$(CONFIG_PPC_FPU) += arch/powerpc/kernel/fpu.o
head-$(CONFIG_ALTIVEC) += arch/powerpc/kernel/vector.o
head-$(CONFIG_PPC_OF_BOOT_TRAMPOLINE) += arch/powerpc/kernel/prom_init.o

+
core-y += arch/powerpc/kernel/ \
arch/powerpc/mm/ \
arch/powerpc/lib/ \
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 2da380f..b356e59 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -4,7 +4,10 @@

CFLAGS_ptrace.o += -DUTS_MACHINE='"$(UTS_MACHINE)"'

+ccflags-y += -fno-function-sections -fno-data-sections
+
subdir-ccflags-$(CONFIG_PPC_WERROR) := -Werror
+subdir-ccflags-y += -fno-function-sections -fno-data-sections

ifeq ($(CONFIG_PPC64),y)
CFLAGS_prom_init.o += $(NO_MINIMAL_TOC)
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 959c131..0856d62 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -56,16 +56,16 @@ SECTIONS
* in order to optimize stub generation.
*/
.head.text : AT(ADDR(.head.text) - LOAD_OFFSET) {
- *(.head.text.first_256B);
+ KEEP(*(.head.text.first_256B));
#ifndef CONFIG_PPC_BOOK3S
. = 0x100;
#else
- *(.head.text.real_vectors);
- *(.head.text.real_trampolines);
- *(.head.text.virt_vectors);
- *(.head.text.virt_trampolines);
+ KEEP(*(.head.text.real_vectors));
+ KEEP(*(.head.text.real_trampolines));
+ KEEP(*(.head.text.virt_vectors));
+ KEEP(*(.head.text.virt_trampolines));
#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
- *(.head.data.fwnmi_page);
+ KEEP(*(.head.data.fwnmi_page));
. = 0x8000;
#else
. = 0x7000;
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 6a67ab9..3a35719 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -312,76 +312,76 @@
/* Kernel symbol table: Normal symbols */ \
__ksymtab : AT(ADDR(__ksymtab) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab) = .; \
- *(SORT(___ksymtab+*)) \
+ KEEP(*(SORT(___ksymtab+*))) \
VMLINUX_SYMBOL(__stop___ksymtab) = .; \
} \
\
/* Kernel symbol table: GPL-only symbols */ \
__ksymtab_gpl : AT(ADDR(__ksymtab_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_gpl) = .; \
- *(SORT(___ksymtab_gpl+*)) \
+ KEEP(*(SORT(___ksymtab_gpl+*))) \
VMLINUX_SYMBOL(__stop___ksymtab_gpl) = .; \
} \
\
/* Kernel symbol table: Normal unused symbols */ \
__ksymtab_unused : AT(ADDR(__ksymtab_unused) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_unused) = .; \
- *(SORT(___ksymtab_unused+*)) \
+ KEEP(*(SORT(___ksymtab_unused+*))) \
VMLINUX_SYMBOL(__stop___ksymtab_unused) = .; \
} \
\
/* Kernel symbol table: GPL-only unused symbols */ \
__ksymtab_unused_gpl : AT(ADDR(__ksymtab_unused_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_unused_gpl) = .; \
- *(SORT(___ksymtab_unused_gpl+*)) \
+ KEEP(*(SORT(___ksymtab_unused_gpl+*))) \
VMLINUX_SYMBOL(__stop___ksymtab_unused_gpl) = .; \
} \
\
/* Kernel symbol table: GPL-future-only symbols */ \
__ksymtab_gpl_future : AT(ADDR(__ksymtab_gpl_future) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___ksymtab_gpl_future) = .; \
- *(SORT(___ksymtab_gpl_future+*)) \
+ KEEP(*(SORT(___ksymtab_gpl_future+*))) \
VMLINUX_SYMBOL(__stop___ksymtab_gpl_future) = .; \
} \
\
/* Kernel symbol table: Normal symbols */ \
__kcrctab : AT(ADDR(__kcrctab) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab) = .; \
- *(SORT(___kcrctab+*)) \
+ KEEP(*(SORT(___kcrctab+*))) \
VMLINUX_SYMBOL(__stop___kcrctab) = .; \
} \
\
/* Kernel symbol table: GPL-only symbols */ \
__kcrctab_gpl : AT(ADDR(__kcrctab_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_gpl) = .; \
- *(SORT(___kcrctab_gpl+*)) \
+ KEEP(*(SORT(___kcrctab_gpl+*))) \
VMLINUX_SYMBOL(__stop___kcrctab_gpl) = .; \
} \
\
/* Kernel symbol table: Normal unused symbols */ \
__kcrctab_unused : AT(ADDR(__kcrctab_unused) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_unused) = .; \
- *(SORT(___kcrctab_unused+*)) \
+ KEEP(*(SORT(___kcrctab_unused+*))) \
VMLINUX_SYMBOL(__stop___kcrctab_unused) = .; \
} \
\
/* Kernel symbol table: GPL-only unused symbols */ \
__kcrctab_unused_gpl : AT(ADDR(__kcrctab_unused_gpl) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_unused_gpl) = .; \
- *(SORT(___kcrctab_unused_gpl+*)) \
+ KEEP(*(SORT(___kcrctab_unused_gpl+*))) \
VMLINUX_SYMBOL(__stop___kcrctab_unused_gpl) = .; \
} \
\
/* Kernel symbol table: GPL-future-only symbols */ \
__kcrctab_gpl_future : AT(ADDR(__kcrctab_gpl_future) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start___kcrctab_gpl_future) = .; \
- *(SORT(___kcrctab_gpl_future+*)) \
+ KEEP(*(SORT(___kcrctab_gpl_future+*))) \
VMLINUX_SYMBOL(__stop___kcrctab_gpl_future) = .; \
} \
\
/* Kernel symbol table: strings */ \
__ksymtab_strings : AT(ADDR(__ksymtab_strings) - LOAD_OFFSET) { \
- *(__ksymtab_strings) \
+ KEEP(*(__ksymtab_strings)) \
} \
\
/* __*init sections */ \
@@ -519,6 +519,7 @@

/* init and exit section handling */
#define INIT_DATA \
+ KEEP(*(SORT(___kentry+*))) \
*(.init.data) \
MEM_DISCARD(init.data) \
KERNEL_CTORS() \
@@ -695,9 +696,9 @@
#define INIT_RAM_FS \
. = ALIGN(4); \
VMLINUX_SYMBOL(__initramfs_start) = .; \
- *(.init.ramfs) \
+ KEEP(*(.init.ramfs)) \
. = ALIGN(8); \
- *(.init.ramfs.info)
+ KEEP(*(.init.ramfs.info))
#else
#define INIT_RAM_FS
#endif
diff --git a/include/linux/export.h b/include/linux/export.h
index 2f9ccbe..a921862 100644
--- a/include/linux/export.h
+++ b/include/linux/export.h
@@ -46,7 +46,7 @@ extern struct module __this_module;
extern __visible void *__crc_##sym __attribute__((weak)); \
static const unsigned long __kcrctab_##sym \
__used \
- __attribute__((section("___kcrctab" sec "+" #sym), unused)) \
+ __attribute__((section("___kcrctab" sec "+" #sym ",\"a\",@note #"), used)) \
= (unsigned long) &__crc_##sym;
#else
#define __CRC_SYMBOL(sym, sec)
@@ -57,12 +57,12 @@ extern struct module __this_module;
extern typeof(sym) sym; \
__CRC_SYMBOL(sym, sec) \
static const char __kstrtab_##sym[] \
- __attribute__((section("__ksymtab_strings"), aligned(1))) \
+ __attribute__((section("__ksymtab_strings" ",\"a\",@note #"), aligned(1))) \
= VMLINUX_SYMBOL_STR(sym); \
extern const struct kernel_symbol __ksymtab_##sym; \
__visible const struct kernel_symbol __ksymtab_##sym \
__used \
- __attribute__((section("___ksymtab" sec "+" #sym), unused)) \
+ __attribute__((section("___ksymtab" sec "+" #sym ",\"a\",@note #"), used)) \
= { (unsigned long)&sym, __kstrtab_##sym }

#if defined(__KSYM_DEPS__)
diff --git a/include/linux/init.h b/include/linux/init.h
index aedb254..51393f4 100644
--- a/include/linux/init.h
+++ b/include/linux/init.h
@@ -156,19 +156,20 @@ extern bool initcall_debug;

#ifndef __ASSEMBLY__

-#ifdef CONFIG_LTO
+#if 1
/* Work around a LTO gcc problem: when there is no reference to a variable
* in a module it will be moved to the end of the program. This causes
* reordering of initcalls which the kernel does not like.
* Add a dummy reference function to avoid this. The function is
* deleted by the linker.
*/
-#define LTO_REFERENCE_INITCALL(x) \
- ; /* yes this is needed */ \
- static __used __exit void *reference_##x(void) \
- { \
- return &x; \
- }
+#define LTO_REFERENCE_INITCALL(sym) \
+ extern typeof(sym) sym; \
+ /* extern const unsigned long __kentry_##sym; */ \
+ static /* __visible */ const unsigned long __kentry_##sym \
+ __used \
+ __attribute__((section("___kentry" "+" #sym ",\"a\",@note #"), used)) \
+ = (unsigned long)&sym;
#else
#define LTO_REFERENCE_INITCALL(x)
#endif
@@ -222,16 +223,18 @@ extern bool initcall_debug;

#define __initcall(fn) device_initcall(fn)

-#define __exitcall(fn) \
- static exitcall_t __exitcall_##fn __exit_call = fn
+#define __exitcall(fn) \
+ static exitcall_t __exitcall_##fn __exit_call = fn; \

-#define console_initcall(fn) \
- static initcall_t __initcall_##fn \
- __used __section(.con_initcall.init) = fn
+#define console_initcall(fn) \
+ static initcall_t __initcall_##fn \
+ __used __section(.con_initcall.init) = fn; \
+ LTO_REFERENCE_INITCALL(__initcall_##fn)

-#define security_initcall(fn) \
- static initcall_t __initcall_##fn \
- __used __section(.security_initcall.init) = fn
+#define security_initcall(fn) \
+ static initcall_t __initcall_##fn \
+ __used __section(.security_initcall.init) = fn; \
+ LTO_REFERENCE_INITCALL(__initcall_##fn)

struct obs_kernel_param {
const char *str;
diff --git a/init/Makefile b/init/Makefile
index 7bc47ee..c4fb455 100644
--- a/init/Makefile
+++ b/init/Makefile
@@ -2,6 +2,8 @@
# Makefile for the linux kernel.
#

+ccflags-y := -fno-function-sections -fno-data-sections
+
obj-y := main.o version.o mounts.o
ifneq ($(CONFIG_BLK_DEV_INITRD),y)
obj-y += noinitramfs.o
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh
index ef4658f..fb848af 100755
--- a/scripts/link-vmlinux.sh
+++ b/scripts/link-vmlinux.sh
@@ -37,17 +37,22 @@ info()
fi
}

+# Grab all the EXPORT_SYMBOL symbols in the vmlinux build
+# ${1} - output file
+exports_extract()
+{
+ ${NM} -g ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} |
+ grep "R __ksymtab_" |
+ sed 's/.*__ksymtab_\(.*\)$/\1/' > ${1}
+}
+
# Link of vmlinux.o used for section mismatch analysis
# ${1} output file
modpost_link()
{
local objects

- if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
- objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
- else
- objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
- fi
+ objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
${LD} ${LDFLAGS} -r -o ${1} ${objects}
}

@@ -60,11 +65,7 @@ vmlinux_link()
local objects

if [ "${SRCARCH}" != "um" ]; then
- if [ -n "${CONFIG_THIN_ARCHIVES}" ]; then
- objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN} --no-whole-archive"
- else
- objects="${KBUILD_VMLINUX_INIT} --start-group ${KBUILD_VMLINUX_MAIN} --end-group"
- fi
+ objects="--whole-archive ${KBUILD_VMLINUX_INIT} ${KBUILD_VMLINUX_MAIN}"
${LD} ${LDFLAGS} ${LDFLAGS_vmlinux} -o ${2} \
-T ${lds} ${objects} ${1}
else