Due to popular demand (well, one or two people asked for it, anyway) I've
updated my dead function optimisation patch to apply to the latest 2.4 series
kernels. I'm very keen to get feedback from people who can test or otherwise
critique this on other platforms, so please let me know what you think!:
Dead Function Optimisation for Linux 2.4 Updated: 19 July 2000
---------------------------------------- ---------------------
The following patch allows gcc/ld to automatically optimise unused functions
and data out of the kernel. In particular functions which aren't ever called
will now be optimised away, even if other functions in the same object are.
I consider this less error prone than relying on other people to wrap all
combinations of their unused functions in #ifdef CONFIG_..., and it's
particularly good if you're building a kernel for embedded systems which don't
use stuff like /proc fs, support for which isn't always wrapped in #ifdefs.
Doing this has turned out to be distinctly non-trivial, so I've described the
changes in detail below. I can't help thinking that the linker should be
able to do a better job on its own without so much mucking about, but for the
moment --gc-sections appears to be the only viable method.
The changes should work work for all architectures in the 2.4.x tree. So far
I've been told it works by people who have tested it on these architectures
(if yours isn't listed here, please tell me that it works!):
ppc
I'd really like to hear from people who can try it on other architectures too.
If you're working on an architecture that isn't in the official source tree,
make equivalent mods to those described below and please let me know how it
goes.
This version of the patch has been generated against linux-2.4.0-test4-pre6.
Hopefully it will apply cleanly against any 2.4 kernel.
The latest version of this patch is available at:
http://members.xoom.com/greyhams/linux/patches/2.4/funcsect.patch
This patch won't apply cleanly against the 2.2 series, but I have another
patch available that does here:
http://members.xoom.com/greyhams/linux/patches/2.2/funcsect.patch
Here is the rationale behind what I've done in the patch:
1. enable gcc's -ffunction-sections/-fdata-sections options, and ld's
--gc-sections option in the top Makefile, which together work all the
magic. The Makefile auto-detects when these flags are supported, so you
need a recent enough gcc and binutils to have them actually turn on.
That causes a whole host of stuff to break, so fix the resulting damage:
2. The section namespace used by -ffunction-sections (.text.*) clashes with
those used by some of linux's headers, so I renamed the sections:
include/linux/init.h's:
.text.init -> .init.text
.text.exit -> .exit.text
.data.init -> .init.data
.data.exit -> .exit.data
.setup.init -> .init.setup for consistency
.initcall.init -> .init.call for consistency
.exitcall.exit -> .exit.call for consistency
rwlock.h, semaphore.h, spinlock.h:
.text.lock -> .lock.text
include/asm-*/cache.h:
.data.cacheline_aligned -> .cacheline_aligned.data
And for consistency in a few .lds files:
.data.page_aligned -> .page_aligned.data
3. The user space exception fixup __ex_table search uses a binary chop. This
relies on references to instructions which may fault on user space accesses
being in the table in ascending address order. Unfortunately, a bug in all
linker versions prior to binutils-2.10 reverses the order of the orphan
.text.* sections generated using -ffunction-sections when an intermediate
"ld -r" is done, causing the binary search to fail.
I tried a number of approaches to workaround this, including using "ar"
instead, and asking the binutils folk to fix the linker bug, which they
have graciously done. I don't want to force people to upgrade though, so...
My prefered solution (used in this patch) is to put all the __ex_table
entries from seperate functions into seperate sections named
__ex_table.__FUNCTION__, mirroring what -ffunction-sections does with
.text . Since the linker bug reorders both sets of sections consistently,
this keeps the __ex_table sorted with respect to the output .text.* section
ordering both with and without the linker bug fixed. Hence there should be
no need to upgrade to the latest binutils in order for this patch to work.
Assembler code continues to lump everything in .text, with ex_table entries
in __ex_table. Hence, the patch won't attempt to optimise away dead
assembler functions. I haven't bothered to add all the explicit .section
ops required for this to happen.
4. get_wchan needs to be able to identify scheduler functions. It used to do
this by declaring magic stub functions 'scheduling_functions_start_here'
and 'scheduling_functions_end_here'. That was a gross hack, and trips over
the linker bug mentioned above when the stubs get reordered. I changed it
to put all scheduler functions in a section named .sched.text using a macro
named __sched, similar to __init. I think this is a cleaner solution, it
no longer relies on function ordering, and it's the same as the way __init
functions are handled.
5. Mods to the vmlinux.lds files to keep the world in sync:
Added entries to the following output sections:
.text : .text.*
.data : .data.*
.rodata : .rodata.*
__ex_table : __ex_table.*
Note that we can't use .text* and .data* to try and mix code compiled with
and without -ffunction-sections, partly because the intermediate "ld -r"
steps use the linker's default .lds file, where .text and .data are matched
explicitly, but all the .text.* and .data.* sections end up as orphans.
Changed section names as per init.h, rwlock.h, semaphore.h, spinlock.h and
cache.h.
Also:
.data.init_task -> .init_task.data
arch/ppc/vmlinux.lds:
.text.pmac -> .pmac.text
.data.pmac -> .pmac.data
.text.prep -> .prep.text
.data.prep -> .prep.data
.text.chrp -> .chrp.text
.data.chrp -> .chrp.data
.text.apus -> .apus.text
.data.apus -> .apus.data
.text.openfirmware -> .openfirmware.text
.data.openfirmware -> .openfirmware.data
Removed the duplicate "apus" entries from arch/ppc/vmlinux.lds
I didn't change these, but for consistency I probably should have:
.proc.info -> info.proc
.arch.info -> info.arch
KEEP table sections whose contents is only referenced via starting &
ending symbols in the .lds file, to prevent their contents being optimised
away:
__ex_table
__ksymtab
.init.setup
.init.call
.proc.info arch/arm/vmlinux?.lds.in only
.arch.info arch/arm/vmlinux?.lds.in only
.IA_64.unwind arch/ia64/vmlinux.lds.S only
.vtop_fixup arch/ppc/vmlinux.lds only
.ptov_fixup arch/ppc/vmlinux.lds only
.fixup arch/sparc/vmlinux.lds only
Check for an explicit ENTRY(_start) in every .lds file to prevent
everything getting optimised away(!) because no external references exist.
Added entries to identify the start and end of the scheduling functions as
__sched_begin & __sched_end, and the .sched.text section.
Added the __ksymtab section to arch/ppc/vmlinux.lds, where it was missing.
6. I've added a check_exception_table function to check_bugs in the ppc
to ensure that the table really is in ascending order, since it's not real
noticable when it's broken until a rogue program passes a bad pointer to
the kernel. This may be temporary; I'm trying to save space, after all.
7. The __get/put_user_asm macros in include/asm-ppc/uaccess.h were making an
explicit reference to ".text", rather than using ".previous" like other
architectures do. This caused me much grief, and should be fixed for
consistency anyway.
8. Updated the commentary in Documentation/exception.txt a tiny bit.
9. Humbly added my name to CREDITS. :-)
This mucks with the exception table somewhat, so if you want to try this out,
make sure you test that a program which generates a bad user space access
doesn't kernel panic. Something like:
#include <unistd.h>
int main()
{
printf("bad write returned %d\n", write(1, 0, 1));
perror("write");
return 0;
}
Also play around with 'ps axl' noting the WCHAN column, to ensure that the
scheduler function stuff still works.
I've noticed that this patch causes simple "unresolved symbol" errors to turn
into masses of relocation overflows from the linker. Sounds like a ld bug.
The patch itself is quite large, so I'll just post a pointer to it here:
http://members.xoom.com/greyhams/linux/patches/2.4/funcsect.patch.gz
Please let me know what you think!
-- Graham Stoney Principal Hardware/Software Engineer Canon Information Systems Research Australia Ph: +61 2 9805 2909 Fax: +61 2 9805 2929- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:12 EST