[PATCH V7 00/10] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing

From: Adrian Hunter
Date: Thu Jul 09 2015 - 09:17:28 EST


Hi

Here is V7 patches for the introduction of an abstraction for
using the AUX area and Instruction tracing. The patches for
AUX area support have been applied, just leaving patches for
Intel PT and Intel BTS.

The patches can also be found here:

http://git.infradead.org/users/ahunter/linux-perf.git

An example (unchanged from V3) perf.data file and build id archive
can be found here:

http://git.infradead.org/~ahunter/tfr/

There is also a tar of the 3 most relevant files with debugging
symbols. These need to be placed in under the correct paths in
/usr/lib/debug to get symbols.

Changes in V7:

Patches already applied:
perf db-export: Fix thread ref-counting
perf tools: Ensure thread-stack is flushed
perf tools: Allow auxtrace data alignment

perf tools: Add Intel PT instruction decoder
Copy the x86 instruction decoder into perf tools source

perf tools: Add Intel PT decoder
Fix Intel PT getting stuck in a loop

Check for being stuck in a loop. That can happen if a
decoder error results in the decoder erroneously setting
the ip to an address that is itself in an infinite loop
that consumes no packets. The only way to be in a loop
that consumes no packets is if it consists of unconditional
branches. So the check for being stuck is if we see
a repeating cycle of consecutive unconditional branches.

perf tools: Add Intel PT support
Add missing err check in intel-pt.
Fix missing thread__puts
Improve Intel PT sync to sideband events

To help synchronize trace data with sideband events
the timestamp when returning to userspace is estimated.

That was not always being done if switch information
was not available, but it is still useful for sync'ing
to mmap changes, so simplify by doing it always when
TSC is available. Also add log prints to help debug
synchronization to sideband.

Improve Intel PT timestamp estimation

Intel PT uses timestamps to synchronize side-band information
to trace data. However timestamps may not be frequent enough.
To improve accuracy, an estimated timestamp is calculated based
on the number of instructions executed since the last known
timestamp.

This patch improves that estimate by taking into account the CPU
frequency as represented by the Intel PT CBR (core-to-bus ratio)
packet.


perf tools: Add Intel BTS support
Fix missing thread__puts
Add a fix for an infinite loop in intel_bts_process_buffer
misplaced in a followup patch in the original patchkit

perf tools: Output sample flags and insn_len from intel_pt
Folded into: perf tools: Add Intel PT support

perf tools: Output sample flags and insn_len from intel_bts
Folded into: perf tools: Add Intel BTS support

perf tools: Intel PT to always update thread stack trace number
Folded into: perf tools: Add Intel PT support

perf tools: Intel BTS to always update thread stack trace number
Folded into: perf tools: Add Intel BTS support

Changes in V6:

Some minor expansion of commit messages.

Patches already applied:
perf tools: Disallow PMU events intel_pt and intel_bts until there is support

perf db-export: Fix thread ref-counting
New patch

perf tools: Ensure thread-stack is flushed
New patch

perf tools: Add Intel PT support
Support thread ref-counting

perf tools: Add Intel PT decoder
Fix a bug: FUP packet in PSB to update last IP

perf tools: Take Intel PT into use
Add Overview and Quickstart sections to intel_pt.txt

perf tools: Add Intel BTS support
Add Overview to intel_bts.txt
Support thread ref-counting

perf tools: Add example call-graph script
Add documentation comments to scripts

Changes in V5:

Patches already applied:
perf report: Fix placement of itrace option in documentation
perf tools: Add AUX area tracing index
perf tools: Hit all build ids when AUX area tracing
perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing
perf auxtrace: Add option to synthesize events for transactions
perf tools: Add support for PERF_RECORD_AUX
perf tools: Add support for PERF_RECORD_ITRACE_START
perf tools: Add AUX area tracing Snapshot Mode
perf record: Add AUX area tracing Snapshot Mode support

perf tools: Disallow PMU events intel_pt and intel_bts until there is support
New patch

perf tools: Add Intel PT decoder
Style improvements pointed out by Acme: aligning '=', single line initializing
Make use of zalloc() not malloc / memset
Make use of zfree
Map internal error codes to fixed constants for output
Change intel_pt_error_message() to intel_pt__strerror()

perf tools: Add Intel PT support
Make use of zfree

perf tools: Take Intel PT into use
Allow "intel_pt" PMU to be selected as an event

perf tools: Add Intel BTS support
Allow "intel_bts" PMU to be selected as an event
Make use of zfree
Map internal error codes to fixed constants for output
Let "intel_bts" show up in 'perf list'

perf tools: Output sample flags and insn_len from intel_bts
Map internal error codes to fixed constants for output

Changes on V4:

perf tools: Amend mmap ref counting for the AUX area mmap
Dropped because already applied

perf script: Always allow fields 'addr' and 'cpu' for auxtrace
Dropped because already applied

perf report: Add Instruction Tracing support
Dropped because already applied

perf report: Fix placement of itrace option in documentation
New patch

perf tools: Add AUX area tracing index
Change size checks for more flexibility i.e.
- don't mind if an indexed auxtrace_event is bigger than
struct auxtrace_event
- don't mind if the auxtrace index does not fill the whole
file section
Rename 'index' variable to 'ent' to avoid build errors on
older gcc

perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing
Fix whitespace alignment of NO_AUXTRACE=1
Add NO_AUXTRACE=1 to make_minimal

perf tools: Add support for PERF_RECORD_AUX
Expand commit message

perf tools: Add AUX area tracing Snapshot Mode
Whitespace fixups

perf record: Add AUX area tracing Snapshot Mode support
Whitespace fixups
Don't init static variables to 0 or NULL

perf tools: Add Intel PT packet decoder
Whitespace fixups

perf tools: Add Intel PT instruction decoder
Avoid build error on older (broken) gcc by adding -Wno-override-init
Avoid build errors due to funny collate sequences i.e. use LC_COLLATE=C etc

perf tools: Add Intel PT decoder
Avoid build errors initializing structures to 0

perf tools: Add Intel PT support
Avoid build errors initializing structures to 0
Allow for perf_pmu__config_terms() having an extra parameter now
Allow for parse_events() having an extra parameter now
Rename 'div' variable to 'd' to avoid build errors
Whitespace fixup
Remove a couple of unused enums

perf tools: Add Intel BTS support
Avoid build errors initializing structures to 0
Allow for parse_events() having an extra parameter now

perf tools: Put itrace options into an asciidoc include
New patch

Changes in V3:

New patch:
perf tools: Amend mmap ref counting for the AUX area mmap

Move some code under arch:
perf tools: Add Intel PT support
perf tools: Add Intel BTS support

Updated documentation:
perf report: Add Instruction Tracing support
perf auxtrace: Add option to synthesize events for transactions
perf tools: Take Intel PT into use
perf tools: Add Intel BTS support

Patches already applied:
perf header: Add AUX area tracing feature
perf evlist: Add support for mmapping an AUX area buffer
perf tools: Add user events for AUX area tracing
perf tools: Add support for AUX area recording
perf record: Add basic AUX area tracing support
perf record: Extend -m option for AUX area tracing mmap pages
perf tools: Add a user event for AUX area tracing errors
perf session: Add hooks to allow transparent decoding of AUX area tracing data
perf session: Add instruction tracing options
perf auxtrace: Add helpers for AUX area tracing errors
perf auxtrace: Add helpers for queuing AUX area tracing data
perf auxtrace: Add a heap for sorting AUX area tracing queues
perf auxtrace: Add processing for AUX area tracing events
perf auxtrace: Add a hashtable for caching
perf tools: Add member to struct dso for an instruction cache
perf script: Add Instruction Tracing support
perf inject: Re-pipe AUX area tracing events
perf inject: Add Instruction Tracing support
perf script: Add field option 'flags' to print sample flags
perf tools: Add aux_watermark member of struct perf_event_attr

Changes in V2:

Get rid of MIN()
perf auxtrace: Add helpers for AUX area tracing errors
perf inject: Re-pipe AUX area tracing events
perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing


Intel BTS can be used on most recent Intel CPUs. Intel PT
is available on Broadwell.

Examples:

Trace 'ls' with Intel BTS userspace only

perf record --per-thread -e intel_bts//u ls
perf report
perf script

Trace 'ls' with Intel BTS kernel and userspace

~/libexec/perf-core/perf-with-kcore record bts-ls --per-thread -e intel_bts// -- ls
~/libexec/perf-core/perf-with-kcore report bts-ls
~/libexec/perf-core/perf-with-kcore script bts-ls

Trace 'ls' with Intel PT userspace only

perf record -e intel_pt//u ls
perf report
perf script

Trace 'ls' with Intel PT kernel and userspace

~/libexec/perf-core/perf-with-kcore record pt-ls -e intel_pt// -- ls
~/libexec/perf-core/perf-with-kcore report pt-ls
~/libexec/perf-core/perf-with-kcore script pt-ls


The abstraction has two separate aspects:
1. recording AUX area data
2. processing AUX area data

Recording consists of mmapping a separate buffer and copying
the data into the perf.data file. The buffer is an AUX area
buffer. The data is written preceded by a new user event
PERF_RECORD_AUXTRACE. The data is too big to fit in the event
but follows immediately afterward. Session processing has to
skip to get to the next event header in a similar fashion to
the existing PERF_RECORD_HEADER_TRACING_DATA
event. The main recording patches are:

perf evlist: Add support for mmapping an AUX area buffer
perf tools: Add user events for AUX area tracing
perf tools: Add support for AUX area recording
perf record: Add basic AUX area tracing support

Processing consists of providing hooks in session processing
to enable a decoder to see all the events and deliver synthesized
events transparently into the event stream. The main processing
patch is:

perf session: Add hooks to allow transparent decoding of AUX area tracing data


Adrian Hunter (10):
perf auxtrace: Add Intel PT as an AUX area tracing type
perf tools: Add Intel PT packet decoder
perf tools: Add Intel PT instruction decoder
perf tools: Add Intel PT log
perf tools: Add Intel PT decoder
perf tools: Add Intel PT support
perf tools: Take Intel PT into use
perf tools: Add Intel BTS support
perf tools: Put itrace options into an asciidoc include
perf tools: Add example call-graph script

tools/build/Makefile.build | 2 +
tools/perf/.gitignore | 1 +
tools/perf/Documentation/intel-bts.txt | 86 +
tools/perf/Documentation/intel-pt.txt | 588 ++++++
tools/perf/Documentation/itrace.txt | 22 +
tools/perf/Documentation/perf-inject.txt | 23 +-
tools/perf/Documentation/perf-report.txt | 23 +-
tools/perf/Documentation/perf-script.txt | 23 +-
tools/perf/Makefile.perf | 12 +-
tools/perf/arch/x86/util/Build | 5 +
tools/perf/arch/x86/util/auxtrace.c | 83 +
tools/perf/arch/x86/util/intel-bts.c | 458 +++++
tools/perf/arch/x86/util/intel-pt.c | 752 ++++++++
tools/perf/arch/x86/util/pmu.c | 18 +
.../scripts/python/call-graph-from-postgresql.py | 327 ++++
tools/perf/scripts/python/export-to-postgresql.py | 47 +
tools/perf/util/Build | 3 +
tools/perf/util/auxtrace.c | 9 +-
tools/perf/util/auxtrace.h | 2 +
tools/perf/util/intel-bts.c | 933 ++++++++++
tools/perf/util/intel-bts.h | 43 +
tools/perf/util/intel-pt-decoder/Build | 11 +
.../util/intel-pt-decoder/gen-insn-attr-x86.awk | 387 ++++
tools/perf/util/intel-pt-decoder/inat.c | 97 +
tools/perf/util/intel-pt-decoder/inat.h | 221 +++
tools/perf/util/intel-pt-decoder/inat_types.h | 29 +
tools/perf/util/intel-pt-decoder/insn.c | 594 ++++++
tools/perf/util/intel-pt-decoder/insn.h | 201 ++
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 1816 +++++++++++++++++++
.../perf/util/intel-pt-decoder/intel-pt-decoder.h | 104 ++
.../util/intel-pt-decoder/intel-pt-insn-decoder.c | 246 +++
.../util/intel-pt-decoder/intel-pt-insn-decoder.h | 65 +
tools/perf/util/intel-pt-decoder/intel-pt-log.c | 155 ++
tools/perf/util/intel-pt-decoder/intel-pt-log.h | 52 +
.../util/intel-pt-decoder/intel-pt-pkt-decoder.c | 400 ++++
.../util/intel-pt-decoder/intel-pt-pkt-decoder.h | 64 +
.../perf/util/intel-pt-decoder/x86-opcode-map.txt | 970 ++++++++++
tools/perf/util/intel-pt.c | 1911 ++++++++++++++++++++
tools/perf/util/intel-pt.h | 51 +
tools/perf/util/pmu.c | 4 -
40 files changed, 10765 insertions(+), 73 deletions(-)
create mode 100644 tools/perf/Documentation/intel-bts.txt
create mode 100644 tools/perf/Documentation/intel-pt.txt
create mode 100644 tools/perf/Documentation/itrace.txt
create mode 100644 tools/perf/arch/x86/util/auxtrace.c
create mode 100644 tools/perf/arch/x86/util/intel-bts.c
create mode 100644 tools/perf/arch/x86/util/intel-pt.c
create mode 100644 tools/perf/arch/x86/util/pmu.c
create mode 100644 tools/perf/scripts/python/call-graph-from-postgresql.py
create mode 100644 tools/perf/util/intel-bts.c
create mode 100644 tools/perf/util/intel-bts.h
create mode 100644 tools/perf/util/intel-pt-decoder/Build
create mode 100644 tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
create mode 100644 tools/perf/util/intel-pt-decoder/inat.c
create mode 100644 tools/perf/util/intel-pt-decoder/inat.h
create mode 100644 tools/perf/util/intel-pt-decoder/inat_types.h
create mode 100644 tools/perf/util/intel-pt-decoder/insn.c
create mode 100644 tools/perf/util/intel-pt-decoder/insn.h
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
create mode 100644 tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
create mode 100644 tools/perf/util/intel-pt.c
create mode 100644 tools/perf/util/intel-pt.h


Regards
Adrian
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/