Re: [patch] measurements, numbers about CONFIG_OPTIMIZE_INLINING=yimpact

From: Linus Torvalds
Date: Fri Jan 09 2009 - 19:42:36 EST




On Sat, 10 Jan 2009, Andi Kleen wrote:
>
> > What's the cost/benefit of that 4%? Does it actually improve performance?
> > Especially if you then want to keep DWARF unwind information in memory in
> > order to fix up some of the problems it causes? At that point, you lost
>
> dwarf unwind information has nothing to do with this, it doesn't tell
> you anything about inlining or not inlining. It just gives you
> finished frames after all of that has been done.
>
> Full line number information would help, but I don't think anyone
> proposed to keep that in memory.

Yeah, true. Although one of the reasons inlining actually ends up causing
problems is because of the bigger stack frames. That leaves a lot of space
for old stale function pointers to peek through.

With denser stack frames, the stack dumps look better, even without an
unwinder.

> > Does it help I$ utilization (which can speed things up a lot more, and is
> > probably the main reason -Os actually tends to perform better)? Likely
> > not. Sure, shrinking code is good for I$, but on the other hand inlining
> > can actually be bad for I$ density because if you inline a function that
> > doesn't get called, you now fragmented your footprint a lot more.
>
> Not sure that is always true; the gcc basic block reordering
> based on its standard branch prediction heuristics (e.g. < 0 or
> == NULL unlikely or the unlikely macro) might well put it all out of line.

I thought -Os actually disabled the basic-block reordering, doesn't it?

And I thought it did that exactly because it generates bigger code and
much worse I$ patterns (ie you have a lot of "conditional branch to other
place and then unconditional branch back" instead of "conditional branch
over the non-taken code".

Also, I think we've had about as much good luck with guessing
"likely/unlikely" as we've had with "inline" ;)

Sadly, apart from some of the "never happens" error cases, the kernel
doesn't tend to have lots of nice patterns. We have almost no loops (well,
there are loops all over, but most of them we hopefully just loop over
once or twice in any good situation), and few really predictable things.

Or rather, they can easily be very predictable under one particular load,
and the totally the other way around under another ..

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/