RE: [RFC] BTS based perf user callchains

From: Metzger, Markus T
Date: Tue Aug 03 2010 - 02:54:33 EST

>-----Original Message-----
>From: Frederic Weisbecker [mailto:fweisbec@xxxxxxxxx]
>Sent: Monday, August 02, 2010 8:35 PM
>To: Ingo Molnar; Peter Zijlstra; Arnaldo Carvalho de Melo; Paul Mackerras; Stephane Eranian; Metzger,
>Markus T; Robert Richter
>Subject: [RFC] BTS based perf user callchains
>As you may know there is an issue with user stacktraces: it requires
>userspace apps to be built with frame pointers.

It requires DWARF to correctly describe how to unwind a frame. You can also
generate ESP-based frames and still get a correct backtrace, provided you
have debug information.

>So there is something we can try: dump a piece of the top user stack page
>each time we have an event hit and let the tools deal with that later using
>the dwarf informations.
>But before trying that, which might require heavy copies, I would like to
>try something based on BTS. The idea is to look at the branch buffer and
>only pick addresses of branches that originated from "call" instructions.

You would also need to track returns.

>So we want BTS activated, only in user ring, without the need of interrupts
>once we reach the limit of the buffer, we can just run in a kind of live
>mode and read on need. This could be a secondary perf event that has no mmap
>buffer. Something only used by the kernel internally by others true perf events
>in a given context. Primary perf events can then read on this BTS buffer when
>they want.
>Now there are two ways:
>- record the whole branch buffer each time we overflow on another perf event
>and let post processing userspace deal with "call" instruction filtering to
>build the stacktrace on top of the branch trace.

If you only care about backtrace, there will be too much noise in the data.
I doubt that you will get a very deep backtrace.

On the other hand, the trace data might be useful for other purposes. But
then, what you would want is BTS and perf events collected in the same buffer.

>- do the "call" filtering on record time. That requires to inspect each
>recorded branches and look at the instruction content from the fast path.

You can try to use LBR for that. Core i7 adds LBR filters that allow you to
only record calls and returns. You will be limited to a handful of records, but
I doubt that you will get much more out of a page of BTS.

With both approaches, the backtrace will not be very deep. There is so much
traffic at the top of the stack that you won't find entries further down.

>I'm not even sure that will work. Also, while looking at the BTS implementation
>in perf, I see we have one BTS buffer per cpu. But that doesn't look right as
>the code flow is not linear per cpu but per task. Hence I suspect we need
>one BTS buffer per task. But may be someone tried that and encountered a

When BTS was stand-alone, there had been one buffer per task. It now uses the perf
ring buffer. The per-cpu buffers are only used to collect the data. On context
switch or buffer overflow, the data is copied into the perf ring buffer.


Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at