Re: [RFC PATCH 0/2] Perf support to SDT markers
From: Hemant
Date: Tue Sep 03 2013 - 09:23:28 EST
On 09/03/2013 02:47 PM, Masami Hiramatsu wrote:
(2013/09/03 17:25), Ingo Molnar wrote:
* Hemant Kumar Shaw <hkshaw@xxxxxxxxxxxxxxxxxx> wrote:
This series adds support to perf to list and probe into the SDT markers.
The first patch implements listing of all the SDT markers present in
the ELFs (executables or libraries). The SDT markers are present in the
.note.stapsdt section of the elf. That section can be traversed to list
all the markers. Recognition of markers follows the SystemTap approach.
The second patch will allow perf to probe into these markers. This is
done by writing the marker name and its offset into the
uprobe_events file in the tracing directory.
Then, perf tools can be used to analyze perf.data file.
Please provide a better high level description that explains the history
and scope of SDT markers, how SDT markers get into binaries, how they can
be used for probing, a real-life usage example that shows something
interesting not possible via other ways, etc.
Indeed, and also I'd like to know what versions of SDT this support,
and where we can see the technical document of that. As far as I know,
the previous(?) SDT implementation also involves ugly semaphores.
Have that already gone?
Thank you,
Here is an overview and a high-level-description:
Goal:
Probe dtrace style markers(SDT) present in user space applications.
Scope:
Put probe points at SDT markers in user space and also probe them using
perf.
Why supprt SDT markers? :
We have lots of applications which use SDT markers today like:
Postgresql, MySql, Mozilla, Perl, Python, Java, Ruby, libvirt, QEMU, glib
These markers are placed at important places by the developers. Now,
these markers
have a negligible overhead when not enabled. We can enable them and probe at
these places and find some important information like the arguments'
values, etc.
How to add SDT markers into user applications:
We need to have this header sys/sdt.h present.
sys/sdt.h used is version 3.
If not present, install systemtap-sdt-devel package.
I will show this through a simple example.
- Create a file with .d extension and mention the probe names in it with
provider name and marker name.
$ cat probes.d
provider user_app {
probe foo_start();
probe fun_start();
};
- Now create the probes.h and probes.o file :
$ dtrace -C -h -s probes.d -o probes.h
$ dtrace -C -G -s probes.d -o probes.o
- A program using the markers:
$ cat user_app.c
#include <stdio.h>
#include "probes.h"
void foo(void)
{
USER_APP_FOO_START();
printf("This is foo\n");
}
void fun(void)
{
USER_APP_FUN_START();
printf("Inside fun\n");
}
int main(void)
{
printf("In main\n");
foo();
fun();
return 0;
}
- Compile it and also provide probes.o file to linker:
$ gcc user_app.c probes.o -o user_app
- Now use perf to list the markers in the app:
# perf probe --list -S -x ./user_app
user_app:foo_start
user_app:fun_start
Total markers = 2
- And then use perf probe to add a probe point :
# perf probe -S -x ./user_app foo_start
Added new event :
event = foo_start (on 0x530)
You can now use it on all perf tools such as :
perf record -e probe_user:foo_start -aR sleep 1
# perf record -e probe_user:foo_start -aR ./user_app
In main
This is foo
Inside fun
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.235 MB perf.data (~10279 samples) ]
- Then use perf tools to analyze it.
# perf report --stdio
# ========
# captured on: Tue Sep 3 16:19:55 2013
# hostname : hemant-fedora
# os release : 3.11.0-rc3+
# perf version : 3.9.4-200.fc18.x86_64
# arch : x86_64
# nrcpus online : 2
# nrcpus avail : 2
# cpudesc : QEMU Virtual CPU version 1.2.2
# cpuid : GenuineIntel,6,2,3
# total memory : 2051912 kBIf these are not enabled, they are present in
the ELF as nop.
# cmdline : /usr/bin/perf record -e probe_user:foo_start -aR ./user_app
# event : name = probe_user:foo_start, type = 2, config = 0x38e, config1
= 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, excl_host = 0,
excl_guest = 1, precise_ip = 0
# HEADER_CPU_TOPOLOGY info available, use -I to display
# HEADER_NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: software = 1, tracepoint = 2, breakpoint = 5
# ========
#
# Samples: 1 of event 'probe_user:foo_start'
# Event count (approx.): 1
#
# Overhead Command Shared Object Symbol
# ........ ........ ............. .......
#
100.00% user_app user_app [.] foo
#
# (For a higher level overview, try: perf report --sort comm,dso)
#
We can see the markers in libvirt (if it is compiled with --with-dtrace
option) :
# perf probe -l -S -x /lib64/libvirt.so.0.10.2
libvirt:event_poll_purge_timeout
libvirt:event_poll_purge_handle
libvirt:event_poll_remove_handle
libvirt:event_poll_add_timeout
libvirt:event_poll_update_timeout
libvirt:event_poll_remove_timeout
libvirt:event_poll_update_handle
libvirt:event_poll_add_handle
libvirt:event_poll_run
libvirt:event_poll_dispatch_timeout
libvirt:event_poll_dispatch_handle
libvirt:object_new
libvirt:object_unref
libvirt:object_dispose
libvirt:object_ref
libvirt:rpc_client_msg_tx_queue
libvirt:rpc_client_msg_rx
libvirt:rpc_client_dispose
libvirt:rpc_client_new
libvirt:rpc_client_msg_tx_queue
libvirt:rpc_server_client_new
libvirt:rpc_server_client_dispose
libvirt:rpc_server_client_msg_tx_queue
libvirt:rpc_server_client_msg_rx
libvirt:rpc_keepalive_dispose
libvirt:rpc_keepalive_send
libvirt:rpc_keepalive_timeout
libvirt:rpc_keepalive_new
libvirt:rpc_keepalive_start
libvirt:rpc_keepalive_stop
libvirt:rpc_keepalive_received
libvirt:rpc_socket_new
libvirt:rpc_socket_dispose
libvirt:rpc_socket_send_fd
libvirt:rpc_socket_recv_fd
- And then use perf to probe into any marker:
# perf probe -S -x /lib64/libvirt.so.0.10.2 rpc_client_msg_tx_queue
Added new event :
event = rpc_client_msg_tx_queue (on 0x1462d9)
You can now use it on all perf tools such as :
perf record -e probe_libvirt:rpc_client_msg_tx_queue -aR sleep 1
This link shows an example of marker probing with Systemtap:
https://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps
- Markers in binaries :
These SDT markers are present in the ELF in the section named
".note.stapsdt".
Here, the name of the marker, its provider, type, location, base
address, semaphore address, arguments are present.
We can retrieve these values using the members name_off and desc_off in
Nhdr structure. If these are not enabled, they are present in the ELF as
nop.
Thanks
Hemant Kumar Shaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/