Re: [PATCH] perf trace: Add support for printing call chains on sys_exit events.
From: Milian Wolff
Date: Sat Apr 09 2016 - 07:45:00 EST
On Freitag, 8. April 2016 15:18:53 CEST Arnaldo Carvalho de Melo wrote:
> Em Fri, Apr 08, 2016 at 02:57:54PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Fri, Apr 08, 2016 at 01:34:15PM +0200, Milian Wolff escreveu:
> > > Now, one can print the call chain for every encountered sys_exit
> > > event, e.g.:
> > >
> > > Note that it is advised to increase the number of mmap pages to
> > > prevent event losses when using this new feature. Often, adding
> > > `-m 10M` to the `perf trace` invocation is enough.
> > >
> > > This feature is also available in strace when built with libunwind
> > >
> > > via `strace -k`. Performance wise, this solution is much better:
> > > $ time find path/to/linux &> /dev/null
> > >
> > > real 0m0.051s
> > > user 0m0.013s
> > > sys 0m0.037s
> > >
> > > $ time perf trace -m 800M --call-graph dwarf find path/to/linux &>
> > > /dev/null
> > >
> > > real 0m2.624s
> > > user 0m1.203s
> > > sys 0m1.333s
> > >
> > > $ time strace -k find path/to/linux &> /dev/null
> > >
> > > real 0m35.398s
> > > user 0m10.403s
> > > sys 0m23.173s
> > >
> > > Note that it is currently not possible to configure the print output.
> > > Adding such a feature, similar to what is available in `perf script`
> > > via its `--fields` knob can be added later on.
> >
> > You mixed up multiple changes in one single patch, I'll break it down
> > while testing, and before pushing upstream.
>
> Expanding a bit the audience:
>
> First test, it works, great! But do we really need that address? I guess
> not, right, perhaps via some callchain parameter, to tell what we want to
> see? But by default knowing the function name + DSO seems enough, no?
...
> Yeah, you agree with that, now that I read the patch 8-):
>
> + /* TODO: user-configurable print_opts */
> + unsigned int print_opts = PRINT_IP_OPT_IP
;-)
I even tried to make the code of `perf script` reusable for `perf trace`, but
stopped once I realised that it currently relies on the existance of a
`perf_session`, which does not exist when we do live tracing. It only exists
for replaying in `builtin-trace.c`. So it involves some more refactoring which
I did not have the time for.
<snip>
> Better, but perhaps we should try aligning, up to a limit, the function
> names/DSOs?
>
> [root@jouet bpf]# trace -e nanosleep --call-graph dwarf usleep 1
> 0.063 ( 0.063 ms): usleep/6132 nanosleep(rqtp: 0x7ffd1b7a8e70
> ) = 0 syscall_slow_exit_work
> ([kernel.kallsyms]) do_syscall_64 ([kernel.kallsyms])
> return_from_SYSCALL_64 ([kernel.kallsyms]) __nanosleep
> (/usr/lib64/libc-2.22.so) usleep (/usr/lib64/libc-2.22.so)
> [unknown] (/usr/bin/usleep) __libc_start_main
> (/usr/lib64/libc-2.22.so) [unknown] (/usr/bin/usleep)
> [root@jouet bpf]#
>
> wdyt?
Yes, sounds good. Many profilers I've worked with always dump the IP, so I
thought we should do it here as well. `perf script` e.g. does it. Could we
maybe print the IP if the symbol is [unknown]?
> Also, after this initial support is in, I think the next step is to
> allow per syscall configs, like we have for per tracepoints, i.e. this
> should be possible:
>
> # trace -e nanosleep(call-graph=dwarf),socket -a
>
> And then we would get callchains just for nanosleep calls, not for
> socket ones. We then need to think how to ask that efficiently to the
> kernel, in this case it should be instead of using
> raw_syscalls:sys_enter + tracepoint filters set via ioctl, to use
> syscalls:sys_{enter,exit}_nanosleep, with callgraphs +
> syscalls:sys_{enter,exit}_socket, without.
>
> Doing it this way allows us to avoid asking callchains for a lot of
> events when we want just for a few ones, to reduce overhead.
Yep, sounds useful for some more specific use-cases. For me, this patch is
sufficient as I'd just do:
$ trace -e nanosleep --call-graph=dwarf ...
What I think is more important though is to make sure we only ask for
callchains on the sys_exit events. Afaik, my patch will do it also for the
sys_enter which is just additional cost with no benefit? So fixing that first
is I think even more important, but I don't know how.
> Anyway, I think I'll just break this down into multiple patches and then
> we can work on these other aspects.
Yes, but note that I'll be busy and then on vacation for the next two weeks.
I'll get back to this after wards.
> David, ah, his patch floated on the linux-perf-users mailing list, easy
> one once the thread->priv one got out of the way (it was being used by
> builtin-trace.c and the unwind code, ugh).
>
> Thanks,
Same to you, cheers!
--
Milian Wolff | milian.wolff@xxxxxxxx | Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt ExpertsAttachment:
smime.p7s
Description: S/MIME cryptographic signature