Re: [BUG?] perf: dwarf unwind doesn't work correctly on aarch64

From: Masami Hiramatsu
Date: Fri Mar 24 2017 - 09:01:05 EST


On Thu, 23 Mar 2017 22:24:01 -0500
Kim Phillips <kim.phillips@xxxxxxx> wrote:

> On Thu, 23 Feb 2017 16:50:18 +0900
> Masami Hiramatsu <mhiramat@xxxxxxxxxx> wrote:
>
> [sorry for the delay, I just saw this]
>
> > perf record -g dwarf (and perf report) doesn't show correct callchain
> > on aarch64. Here is how to reproduce it.
> ...
> > # Samples: 6K of event 'cpu-clock:u'
> > # Event count (approx.): 1623750000
> > #
> > # Children Self Command Shared Object Symbol
> > # ........ ........ ....... ............. ..........................
> > #
> > 17.21% 17.21% main main [.] func2
> > |
> > ---func2
> >
> > 17.09% 17.09% main main [.] func1
> > |
> > ---func1
> >
> > 16.67% 16.67% main main [.] main
> > |
> > ---main
> > .....
> >
> > So, as you can see, the call graph reported each function has been
> > called from itself. If I report it with fp as below, perf reported
> > correct callgraph.
> ...
> > I guess there is a bug in libunwind on aarch64 or we missed to pass
> > the stack data to libunwind. (BTW, it works correctly on arm32)
>
> Trying to replicate this on a debian 9 ("stretch") arm64 box:

I'm using debian 8 ("jessie"), but I can try debian 9 too.

> Building acme's 'perf/urgent' branch (currently with the tag
> perf-urgent-for-mingo-4.11-20170317), natively (cd tools; make clean;
> make DEBUG=5 -C perf) shows this system has unwind support:
>
> Auto-detecting system features:
> ... dwarf: [ on ]
> ... dwarf_getlocations: [ on ]
> ... glibc: [ on ]
> ... gtk2: [ on ]
> ... libaudit: [ on ]
> ... libbfd: [ on ]
> ... libelf: [ on ]
> ... libnuma: [ on ]
> ... numa_num_possible_cpus: [ on ]
> ... libperl: [ OFF ]
> ... libpython: [ on ]
> ... libslang: [ on ]
> ... libcrypto: [ on ]
> ... libunwind: [ on ]
> ... libdw-dwarf-unwind: [ on ]
> ... zlib: [ on ]
> ... lzma: [ on ]
> ... get_cpuid: [ OFF ]
> ... bpf: [ on ]
>
> for which an apt search unwind returns the version:
>
> libunwind-dev/testing,now 1.1-4.1 arm64 [installed]
> library to determine the call-chain of a program - development
> libunwind8/testing,now 1.1-4.1 arm64 [installed,automatic]
> library to determine the call-chain of a program - runtime

I've tried the same version and also tried with 1.2 and both not working.

>
> continuing, and ignoring the no debug_frame support perf configure
> mentions:
>
> Makefile.config:421: No debug_frame support found in libunwind-aarch64
> Makefile.config:480: No debug_frame support found in libunwind

Hmm, this seems --call-graph dwarf may not use debuginfo, right?

> $ ./perf --version
> perf version 4.10.rc4.ge7ede72
> $ gcc --version
> gcc (Debian 6.3.0-6) 6.3.0 20170205
> Copyright (C) 2016 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions. There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> $ gcc -O0 -ggdb3 -funwind-tables -o main main.c
> $ ./perf record -g --call-graph dwarf,1024 -e cpu-clock:u -o /tmp/perf.data -- ./main
> ^C[ perf record: Woken up 121 times to write data ]
> [ perf record: Captured and wrote 30.154 MB /tmp/perf.data (22975 samples) ]
>
> $ ./perf --no-pager report -i /tmp/perf.data --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 22K of event 'cpu-clock:u'
> # Event count (approx.): 5743750000
> #
> # Children Self Command Shared Object Symbol
> # ........ ........ ....... ............. .....................
> #
> 100.00% 8.14% main main [.] main
> |
> |--91.86%--main
> | func0
> | |
> | --76.41%--func1
> | |
> | --60.82%--func2
> | |
> | --45.31%--func3
> | |
> | --30.17%--func4
> | |
> | --15.04%--func
> |
> --8.14%--__libc_start_main
> main
> ...
>
> which looks like it should, i.e., I can't reproduce.

Sound good news! I'll try to test again on debian 9.

>
> You mentioned you're using the 'latest' sources for libunwind, etc.,
> but can you provide more exact details like commit IDs, and what, if
> anything, is being cross-built vs. native?

I'm using qemu-user-static for install rootfs (by de-bootstrap) and perf.
For running the test code and perf, I'm currently using qemu-system-arm64.
So, it's a kind of native build.

Thank you!

--
Masami Hiramatsu <mhiramat@xxxxxxxxxx>