PEBS level 2/3 breaks dwarf unwinding! [WAS: Re: Broken dwarf unwinding - wrong stack pointer register value?]

From: Milian Wolff
Date: Thu Nov 01 2018 - 18:08:30 EST


On Dienstag, 30. Oktober 2018 23:34:35 CET Milian Wolff wrote:
> On Mittwoch, 24. Oktober 2018 16:48:18 CET Andi Kleen wrote:
> > > Can someone at least confirm whether unwinding from a function prologue
> > > via
> > > .eh_frame (but without .debug_frame) should actually be possible?
> >
> > Yes it should be possible. Asynchronous unwind tables should work
> > from any instruction.

<snip>

> We can find `7f91345bdaf8+1 = 7f91345bdaf9" at offset 16 (search for "f9 da
> 5b 34 91 7f"). Using that address makes unwinding work for this sample.
> What could be the reason for this shift?

I believe I have found the culprit: PEBS seems to be at fault here - i.e. the
RIP/RSP and the ustack dump of the sample simply don't fit together.

Check this out:

```
$ for i in $(seq 10); do perf record -q -e "cycles:" --call-graph dwarf ./cpp-
inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:p" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
0
0
0
0
0
0
0
0
0
0

$ for i in $(seq 10); do perf record -q -e "cycles:pp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
37
39
35
28
40
39
29
37
31
26

$ for i in $(seq 10); do perf record -q -e "cycles:ppp" --call-graph dwarf ./
cpp-inlining > /dev/null; perf script | pcre2grep -c -M "hypot_finite.*\n.*\
[unknown\]"; done
79
70
76
77
70
90
64
78
86
74
```

Note how precise levels 0 and 1 do not produce any samples where unwinding
fails. But precise level 2 produces some, and precise level 3 increases the
amount (by ca. ~2x).

I can reproduce this pattern on two separate Intel CPUs and kernel versions
currently:

Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz with 4.18.16-arch1-1-ARCH
Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz with 4.14.78-1-lts

Could someone else try this? What about AMD and IBS - is it also affected?
What about newer/different Intel CPUs?

Better yet, can someone come up with a fix for this on Intel with maximum
precise level?

Thanks

--
Milian Wolff | milian.wolff@xxxxxxxx | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

Attachment: smime.p7s
Description: S/MIME cryptographic signature