Re: Broken dwarf unwinding - wrong stack pointer register value?

From: Milian Wolff
Date: Sun Oct 21 2018 - 16:32:26 EST


On Sonntag, 21. Oktober 2018 00:39:51 CEST Milian Wolff wrote:
> Hey all,
>
> I'm on the quest to figure out why perf regularly fails to unwind (some)
> samples. I am seeing very strange behavior, where an apparently wrong stack
> pointer value is read from the register - see below for more information and
> the end of this (long) mail for my open questions. Any help would be
> greatly appreciated.
>
> I am currently using this trivial C++ code to reproduce the issue:
>
> ```
> #include <cmath>
> #include <complex>
> #include <iostream>
> #include <random>
>
> using namespace std;
>
> int main()
> {
> uniform_real_distribution<double> uniform(-1E5, 1E5);
> default_random_engine engine;
> double s = 0;
> for (int i = 0; i < 10000000; ++i) {
> s += norm(complex<double>(uniform(engine), uniform(engine)));
> }
> cout << s << '\n';
> return 0;
> }
> ```
>
> I compile it with `g++ -O2 -g` and then record it with `perf record --call-
> graph dwarf`. Using perf script, I then see e.g.:

With my patch to regularly flush the perf script output buffer, we can now
easily find all broken backtraces and the corresponding debug output via:

$ perf script --ns -v |& awk -v RS='' '/\[unknown\]/ {print "\n"$0}'

I've pasted the output to the above command from my machine here:
https://paste.kde.org/pmyxwkk1k

This contains 139 samples with broken unwinding, out of 2350 samples in total,
so about 6% of all samples are broken.

In many cases, the first accessed memory is 0 because a too-low offset into
the stack is computed from the SP value, similar to the scenario I described
in my initial mail. In other cases we read garbadge addresses such as

unwind: access_mem addr 0x7ffc80811cf0 val 408195dfbda90580, offset 24

In all cases except for the the two samples at the very start and end of this
log, the last offset encountered in access_mem is lower than 72. Remember what
I wrote in the initial mail - if I manually hack the access_mem function to
use 72 for one of the broken samples, it made unwinding magically work
again...

With this addition of data - can anyone sched some light on what's potentially
going on here? How can we improve this situation?

Thanks
--
Milian Wolff | milian.wolff@xxxxxxxx | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

Attachment: smime.p7s
Description: S/MIME cryptographic signature