[PATCH] Dead function optimisation for 2.4

From: Graham Stoney (greyham@research.canon.com.au)
Date: Wed Jul 19 2000 - 01:48:53 EST


Due to popular demand (well, one or two people asked for it, anyway) I've
updated my dead function optimisation patch to apply to the latest 2.4 series
kernels. I'm very keen to get feedback from people who can test or otherwise
critique this on other platforms, so please let me know what you think!:

Dead Function Optimisation for Linux 2.4 Updated: 19 July 2000
---------------------------------------- ---------------------

The following patch allows gcc/ld to automatically optimise unused functions
and data out of the kernel. In particular functions which aren't ever called
will now be optimised away, even if other functions in the same object are.
I consider this less error prone than relying on other people to wrap all
combinations of their unused functions in #ifdef CONFIG_..., and it's
particularly good if you're building a kernel for embedded systems which don't
use stuff like /proc fs, support for which isn't always wrapped in #ifdefs.

Doing this has turned out to be distinctly non-trivial, so I've described the
changes in detail below. I can't help thinking that the linker should be
able to do a better job on its own without so much mucking about, but for the
moment --gc-sections appears to be the only viable method.

The changes should work work for all architectures in the 2.4.x tree. So far
I've been told it works by people who have tested it on these architectures
(if yours isn't listed here, please tell me that it works!):
    ppc

I'd really like to hear from people who can try it on other architectures too.
If you're working on an architecture that isn't in the official source tree,
make equivalent mods to those described below and please let me know how it
goes.

This version of the patch has been generated against linux-2.4.0-test4-pre6.
Hopefully it will apply cleanly against any 2.4 kernel.

The latest version of this patch is available at:
    http://members.xoom.com/greyhams/linux/patches/2.4/funcsect.patch

This patch won't apply cleanly against the 2.2 series, but I have another
patch available that does here:
    http://members.xoom.com/greyhams/linux/patches/2.2/funcsect.patch

Here is the rationale behind what I've done in the patch:

1. enable gcc's -ffunction-sections/-fdata-sections options, and ld's
   --gc-sections option in the top Makefile, which together work all the
   magic. The Makefile auto-detects when these flags are supported, so you
   need a recent enough gcc and binutils to have them actually turn on.

   That causes a whole host of stuff to break, so fix the resulting damage:

2. The section namespace used by -ffunction-sections (.text.*) clashes with
   those used by some of linux's headers, so I renamed the sections:
   
   include/linux/init.h's:
    .text.init -> .init.text
    .text.exit -> .exit.text
    .data.init -> .init.data
    .data.exit -> .exit.data
    .setup.init -> .init.setup for consistency
    .initcall.init -> .init.call for consistency
    .exitcall.exit -> .exit.call for consistency

   rwlock.h, semaphore.h, spinlock.h:
    .text.lock -> .lock.text

   include/asm-*/cache.h:
    .data.cacheline_aligned -> .cacheline_aligned.data

   And for consistency in a few .lds files:
    .data.page_aligned -> .page_aligned.data

3. The user space exception fixup __ex_table search uses a binary chop. This
   relies on references to instructions which may fault on user space accesses
   being in the table in ascending address order. Unfortunately, a bug in all
   linker versions prior to binutils-2.10 reverses the order of the orphan
   .text.* sections generated using -ffunction-sections when an intermediate
   "ld -r" is done, causing the binary search to fail.

   I tried a number of approaches to workaround this, including using "ar"
   instead, and asking the binutils folk to fix the linker bug, which they
   have graciously done. I don't want to force people to upgrade though, so...

   My prefered solution (used in this patch) is to put all the __ex_table
   entries from seperate functions into seperate sections named
   __ex_table.__FUNCTION__, mirroring what -ffunction-sections does with
   .text . Since the linker bug reorders both sets of sections consistently,
   this keeps the __ex_table sorted with respect to the output .text.* section
   ordering both with and without the linker bug fixed. Hence there should be
   no need to upgrade to the latest binutils in order for this patch to work.

   Assembler code continues to lump everything in .text, with ex_table entries
   in __ex_table. Hence, the patch won't attempt to optimise away dead
   assembler functions. I haven't bothered to add all the explicit .section
   ops required for this to happen.

4. get_wchan needs to be able to identify scheduler functions. It used to do
   this by declaring magic stub functions 'scheduling_functions_start_here'
   and 'scheduling_functions_end_here'. That was a gross hack, and trips over
   the linker bug mentioned above when the stubs get reordered. I changed it
   to put all scheduler functions in a section named .sched.text using a macro
   named __sched, similar to __init. I think this is a cleaner solution, it
   no longer relies on function ordering, and it's the same as the way __init
   functions are handled.

5. Mods to the vmlinux.lds files to keep the world in sync:

   Added entries to the following output sections:
    .text : .text.*
    .data : .data.*
    .rodata : .rodata.*
    __ex_table : __ex_table.*

   Note that we can't use .text* and .data* to try and mix code compiled with
   and without -ffunction-sections, partly because the intermediate "ld -r"
   steps use the linker's default .lds file, where .text and .data are matched
   explicitly, but all the .text.* and .data.* sections end up as orphans.

   Changed section names as per init.h, rwlock.h, semaphore.h, spinlock.h and
   cache.h.

   Also:
    .data.init_task -> .init_task.data

   arch/ppc/vmlinux.lds:
    .text.pmac -> .pmac.text
    .data.pmac -> .pmac.data
    
    .text.prep -> .prep.text
    .data.prep -> .prep.data
    
    .text.chrp -> .chrp.text
    .data.chrp -> .chrp.data
    
    .text.apus -> .apus.text
    .data.apus -> .apus.data
    
    .text.openfirmware -> .openfirmware.text
    .data.openfirmware -> .openfirmware.data

   Removed the duplicate "apus" entries from arch/ppc/vmlinux.lds

   I didn't change these, but for consistency I probably should have:
    .proc.info -> info.proc
    .arch.info -> info.arch

   KEEP table sections whose contents is only referenced via starting &
   ending symbols in the .lds file, to prevent their contents being optimised
   away:
    __ex_table
    __ksymtab
    .init.setup
    .init.call
    .proc.info arch/arm/vmlinux?.lds.in only
    .arch.info arch/arm/vmlinux?.lds.in only
    .IA_64.unwind arch/ia64/vmlinux.lds.S only
    .vtop_fixup arch/ppc/vmlinux.lds only
    .ptov_fixup arch/ppc/vmlinux.lds only
    .fixup arch/sparc/vmlinux.lds only

   Check for an explicit ENTRY(_start) in every .lds file to prevent
   everything getting optimised away(!) because no external references exist.
   
   Added entries to identify the start and end of the scheduling functions as
   __sched_begin & __sched_end, and the .sched.text section.

   Added the __ksymtab section to arch/ppc/vmlinux.lds, where it was missing.

6. I've added a check_exception_table function to check_bugs in the ppc
   to ensure that the table really is in ascending order, since it's not real
   noticable when it's broken until a rogue program passes a bad pointer to
   the kernel. This may be temporary; I'm trying to save space, after all.

7. The __get/put_user_asm macros in include/asm-ppc/uaccess.h were making an
   explicit reference to ".text", rather than using ".previous" like other
   architectures do. This caused me much grief, and should be fixed for
   consistency anyway.

8. Updated the commentary in Documentation/exception.txt a tiny bit.

9. Humbly added my name to CREDITS. :-)

This mucks with the exception table somewhat, so if you want to try this out,
make sure you test that a program which generates a bad user space access
doesn't kernel panic. Something like:

    #include <unistd.h>

    int main()
    {
        printf("bad write returned %d\n", write(1, 0, 1));
        perror("write");
        return 0;
    }

Also play around with 'ps axl' noting the WCHAN column, to ensure that the
scheduler function stuff still works.

I've noticed that this patch causes simple "unresolved symbol" errors to turn
into masses of relocation overflows from the linker. Sounds like a ld bug.

The patch itself is quite large, so I'll just post a pointer to it here:
    http://members.xoom.com/greyhams/linux/patches/2.4/funcsect.patch.gz

Please let me know what you think!

-- 
Graham Stoney
Principal Hardware/Software Engineer
Canon Information Systems Research Australia
Ph: +61 2 9805 2909  Fax: +61 2 9805 2929

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:12 EST