Re: [PATCH v3 00/14] perf, x86: Haswell LBR call stack support

From: Stephane Eranian
Date: Thu Feb 27 2014 - 04:10:12 EST


On Wed, Feb 26, 2014 at 10:42 PM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
> On Wed, Feb 26, 2014 at 02:34:31PM -0700, David Ahern wrote:
>> On 2/26/14, 1:53 PM, Andi Kleen wrote:
>> >>Is there some reason not to enable frame pointers?
>> >
>> >It makes code slower.
>>

That is what I have been told by compiler people too.

This is especially true of small functions which C++ object-oriented
code is full of. And that's how large programs are written with these
days.

The other problem with FP is hat you need to have everything
compiled with it. It is not always obvious how to check this, without
going to assembly. There is no indication in the ELF headers, AFAIK.


>> Sure there is some overhead because of the push, mov, pop
>> instructions per function. But, take for example the simple program
>> below. Compile with and without frame pointers
>
> I'm not criticizing your choice, just saying that
> it's often not practical to get FP everywhere
> (and I bet you missed some cases too)
>
> <.. micro benchmark snipped...>
>
> The CPU you're using has special hardware to avoid the main
> problems with FP. It can still cause slow downs in other
> cases (e.g. one register less). But there are other
> CPUs where this special hardware is not available.
>
> You may not care about these cases, but other people do.
>
>> >wrong annotations, out of date or broken dwarf library etc.)
>>
>> dwarf is often just not usable:
>

The first problem with the dwarf approach is that it incurs some
overhead during sampling. You need to copy a chunk of the user stack
in each sample. Not clear how much you need.

The second problem is security. You are saving random chunks of stack
in the perf.data file. Who knows what it contains. In many environments this
is a showstopper.

The Haswell LBR call stack is a good compromise, though as Andi
pointed out it has its tradeoffs. It does not work in all
cases. But it has the speed and the security. It is model specific.
But I can live with that. PMU always comes with incremental
improvements.

> I agree (altough I haven't seen that error before)
>
>
>> That is a huge difference. Not to mention the fact the dwarf file is
>> useless which means radically lowering sample rate and increasing
>> mmap size.
>
> Yep.
>
> It's just fundamentally inefficient for profiling.
>
> -Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/