Re: [PATCH v3 00/14] perf, x86: Haswell LBR call stack support

From: Andi Kleen
Date: Wed Apr 09 2014 - 12:49:06 EST


On Wed, Apr 09, 2014 at 01:48:57PM +0200, Peter Zijlstra wrote:
> On Wed, Feb 26, 2014 at 12:26:43PM -0800, Andy Lutomirski wrote:
> > Speed. FPO saves one register (a big deal on x86_32; not so important
> > on x86_64) but also saves a few cycles on function entry and exit,
> > which is a bigger deal for small functions.
>
> So I though that LTO was supposed to get rid of a lot of the small
> function and inline them.

It does it when it can (no indirect), thinks it's profitable and won't
increase code size too much.

>
> I've also heard that in practise this is very 'hard', and thus we're
> still stuck with a gazillion small functions (mostly C++ people suffer
> from this).

They need devirtualization, which we cannot do currently in the kernel.

>
> Can anybody give a concise explanation on why LTO doesn't rid us of
> these small functions or point to a web resource that describes the
> problem?

It depends on the code of course.
On one of my LTO builds I have ~10% less functions in System.map.

Actual results will vary of course on the config.

-Andi


--
ak@xxxxxxxxxxxxxxx -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/