Re: [patch 00/2] improve .text size on gcc 4.0 and newer compilers

From: Ingo Molnar
Date: Mon Jan 02 2006 - 16:32:02 EST



* Andrew Morton <akpm@xxxxxxxx> wrote:

> > > what is the 'deeper problem'? I believe it is a combination of two
> > > (well-known) things:
> > >
> > > 1) people add 'inline' too easily
> > > 2) we default to 'always inline'
> >
> > For example, I add "inline" for static functions which are only called
> > from one place.
> >
> > If I'm able to say "this is static function which is called from one
> > place" I'd do so instead of saying "inline". But omitting the "inline"
> > with hope that some new gcc probably will inline it anyway (on some
> > platform?) doesn't seem like a best idea.
> >
> > But what _is_ the best idea?
>
> Just use `inline'. With gcc-3 it'll be inlined.
>
> With gcc-4 and Ingo's patch it _might_ be inlined. And it _might_ be
> uninlined by the compiler if someone adds a second callsite later on.
> Maybe. We just don't know. That's a problem. Use of __always_inline
> will remove this uncertainty.

i agree with your later points, so this is only a minor nit: why is a
dynamic decision done by the compiler a 'problem' in itself?

It _could_ _become_ a problem if the compiler does it incorrectly, but
that's so true for just about any other dynamic gcc decision: what
registers it opts to use in a hotpath, what amount of loop-unrolling it
does, what machine-ops it choses, how it schedules them, how it reorders
them, how it generates memory access patterns, etc., etc. Sure, the
compiler can mess up in _any_ of these dynamic decisions, with possibly
bad effects to performance, but that by itself doesnt create some magic
'dynamic inlining is bad' axiom.

In fact, i believe the opposite is true: inlining is arguably best done
dynamically. Whether gcc makes use of that theoretical opening is
another question, but my measurements show that gcc4 does a quite good
job of it. (It certainly does a better job than what we humans did over
the last 5 years, creating 20,000+ inline markers.)

and even if we let gcc do the inlining, a global -finline-limit=0
compile-time flag will essentially turn off all 'hinted' inlining done
by gcc.

> So our options appear to be:
>
> a) Go fix up stupid inlinings (again) or
>
> b) Apply Ingo's patch, then go add __always_inline to places which we
> care about.

note that one of the patches i did (a small one in fact) does exactly
that, for x86: i marked all things __always_inline that allyesconfig
needs inlined.

> Either way, we need to go all over the tree. In practice, we'll only
> bother going over the bits which we most care about (core kernel, core
> networking, a handful of net and block drivers). I suspect many of
> the bad inlining decisions are in poorly-maintained code - we've been
> pretty careful about this for several years.

yes. And this pretty much supports my point: we should flip the meaning
of 'inline' around, from 'always inline', to at least: 'inline only if
gcc thinks so too, if we are optimizing for size'.

and i'd be equally happy with making the flip-around even more agressive
than my first patch-queue did, to e.g. alias 'inline' to 'nothing':

#define inline

and to then remove inline from all .c level files (and most .h level
files) as well - albeit this would punish places that use inline
judiciously.

Even in this case, it is very likely much easier to re-add inlines to
the few places that really improve from them (even though they dont need
it in the __always_inline sense like vsyscalls or kmalloc()), than to
keep the current 'always inline' logic in place and to hope for a
gradual reduction of the use of the inline keyword...

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/