Re: [BUG REPORT] perf tools: x86_64: Broken calllchain when sampling taken at 'callq' instruction

From: Wangnan (F)
Date: Wed Nov 18 2015 - 03:55:08 EST




On 2015/11/18 16:42, Wangnan (F) wrote:


On 2015/11/18 16:20, Ingo Molnar wrote:
* Wangnan (F) <wangnan0@xxxxxxxxxx> wrote:

On 2015/11/18 15:20, Wangnan (F) wrote:
Hi all,

When analysising Jiri's patchset [1] I found a dwarf unwind problem.
On x86 platform, when sample is at a 'callq' instruction, dwarf based
stack unwind always fail.

I compile a small C source file with debug information, turn off
frame pointer and disable optimization:

$ gcc -g -O0 -fomit-frame-pointer ./test_dwarf_unwind.c -o
./test_dwarf_unwind
For whom want to test it: here is the test code I used.

#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>

static volatile int x = 0;

int funcc(void)
{
struct timeval tv1, tv2;
unsigned long us1, us2;

gettimeofday(&tv1, NULL);

us1 = tv1.tv_sec * 1000000 + tv1.tv_usec;

while(1) {
x = x + 100;
gettimeofday(&tv2, NULL);
us2 = tv2.tv_sec * 1000000 + tv2.tv_usec;
if (us2 - us1 >= 3000000)
break;
}
return x;
}
int funcb(void) { return funcc();}
int funca(void) { return funcb();}
int main() { funca(); return 0;}
What CPU model is this, and what event was used - PEBS perhaps? This might be some
sort of PMU sampling bug/quirk/misfeature - or perhaps a kernel side fixup that
went bad?

$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 60
model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
stepping : 3
microcode : 0x1c
cpu MHz : 3600.000
cache size : 8192 KB
physical id : 0
siblings : 8
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm ida arat epb pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs :
bogomips : 7183.88
clflush size : 64
cache_alignment : 64
address sizes : 39 bits physical, 48 bits virtual
power management:


perf cmdline is

# ./pref record -g -F 9 --call-graph dwarf ./test_dwarf_unwind

Use default events, precise_ip == 2 so uses PEBS.


Testetd 'cycles', 'cycles:p' and 'cycles:pp'. Only 'cycles:pp' captures
sample at callq. So maybe a PEBS problem?

Thank you.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/