Re: [patch] uninline init_waitqueue_*() functions

From: Ingo Molnar
Date: Wed Jul 05 2006 - 16:48:50 EST



* Andrew Morton <akpm@xxxxxxxx> wrote:

> Ingo Molnar <mingo@xxxxxxx> wrote:
> >
> > > > i also tried a config with the best size settings (disabling
> > > > FRAME_POINTER, enabling CC_OPTIMIZE_FOR_SIZE), and this gives:
> > > >
> > > > text data bss dec filename
> > > > 20777768 6076042 3081864 29935674 vmlinux.x32.size.before
> > > > 20748140 6076178 3081864 29906182 vmlinux.x32.size.after
> > > >
> > > > or a 34.8 bytes win per callsite (29K total).
> > > >
> > >
> > > With gcc-4.1.0 on i686, uninlining those three functions as per the
> > > below patch _increases_ the allnoconfig vmlinux's .text from 833456
> > > bytes to 833728.
> >
> > that's just the effect of CONFIG_REGPARM and CONFIG_CC_OPTIMIZE_FOR_SIZE
> > not being set in an allnoconfig. Once i set them the text size evens
> > out:
> >
> > 431348 60666 27276 519290 7ec7a vmlinux.x32.mini.before
> > 431359 60666 27276 519301 7ec85 vmlinux.x32.mini.after
> >
> > compiling without CONFIG_REGPARM on i686 (if you care about text size)
> > makes little sense. It penalizes function calls artificially.
>
> OK, but what happened to the 35-bytes-per-callsite saving?

there are 3 effects i can see:

firstly, allnoconfig implies SMP off, so the spinlock init in
init_waitqueue_head() is not done and it becomes a 2-instruction thing.

secondly, the savings depend on the function size. Uninlining from a
small function (that makes use of the inlined function) can be a loss if
the function call causes more register clobbering. For large functions
we clobber all registers anyway so there's no extra stack saving, etc.

the allnoconfig kernel makes use of these waitqueue ops in smaller
kernel-core functions as well, where the uninlining is a loss.
(sleep_on(), etc.) Furthermore, UP kernel tends to decrease function
sizes too.

larger kernel configs include more non-core subsystems/drivers as well,
which tend to have larger function sizes. There the uninlining win is
larger.

there's a third, smaller effect too: gcc manages to do tail-merging of
the init_waitqueue_head calls, in a handful of cases, further reducing
the cost. I found no such tail-merges done for the 53 callsites in the
allnoconfig kernel.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/