[GIT PULL 00/68] perf/core improvements and fixes

From: Arnaldo Carvalho de Melo
Date: Tue Oct 11 2016 - 13:35:38 EST


From: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>

Hi Ingo,

Please consider pulling,

- Arnaldo

Build and test stats at the end of the message.

The following changes since commit c68306ce20ad03ce655a367fc33ad06e12bb87a6:

Merge tag 'perf-core-for-mingo-20161005' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent (2016-10-07 00:36:49 +0200)

are available in the git repository at:

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-core-for-mingo-20161011

for you to fetch changes up to 193b29e31a5cfec42790a59fc453359bb6ee0ea1:

perf jevents: Handle events including .c and .o (2016-10-11 12:34:39 -0300)

----------------------------------------------------------------
perf/core improvements and fixes:

New features:

- The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis.
It allows you to track down cacheline contention. The tool is based
on x86's load latency and precise store facility events provided by
Intel CPUs.

It was tested by Joe Mario and has proven to be useful, finding som
cacheline contentions. Joe also wrote a blog about c2c tool with
examples:

https://joemario.github.io/blog/2016/09/01/c2c-blog/

There one finds extensive details on using the tool, with tips on
reducing the volume of samples while still capturing enough to do
its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa)

- Add support in 'perf list' to show only events in vendor notation,
built from JSON (Andi Kleen)

- Handle completion of upper case events, as users of the JSON events
are used to. Using it as lowercase also works. (Andi Kleen)

- Report Intel-PT/BTS instruction bytes in 'perf script' (Andi Kleen)

Fixes:

- Fix handling of numa nodes in perf.data files (Jiri Olsa)

- Fix scrolling when refreshing 'perf top --tui --hierarchy' entries (Namhyung Kim)

- Fix handling of events including .c and .o, that were being treated as
BPF scripts instead of JSON ones (Wang Nan)

Infrastructure:

- Sync copy of x86's syscall table (Arnaldo Carvalho de Melo)

- prep work for making libtraceevent more widely used (Jiri Olsa)

- Show list of features not present in a perf.data file when using
'perf report --header-only', to help with debugging (Jiri Olsa)

- When failing to process a record, show its name, not its number (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>

----------------------------------------------------------------
Adrian Hunter (1):
perf intel-pt/bts: Tidy instruction buffer size usage

Andi Kleen (3):
perf list: Add support for listing only json events
perf tools: Handle completion of upper case events
perf intel-pt/bts: Report instruction bytes and length in sample

Arnaldo Carvalho de Melo (1):
perf tools: Sync copy of x86's syscall table

Jiri Olsa (61):
perf c2c: Introduce c2c_decode_stats function
perf c2c: Introduce c2c_add_stats function
perf c2c: Add c2c command
perf c2c: Add record subcommand
perf c2c: Add report subcommand
perf c2c report: Add dimension support
perf c2c report: Add sort_entry dimension support
perf c2c report: Fallback to standard dimensions
perf c2c report: Add sample processing
perf c2c report: Add cacheline hists processing
perf c2c report: Decode c2c_stats for hist entries
perf c2c report: Add header macros
perf c2c report: Add 'dcacheline' dimension key
perf c2c report: Add 'offset' dimension key
perf c2c report: Add 'iaddr' dimension key
perf c2c report: Add hitm related dimension keys
perf c2c report: Add stores related dimension keys
perf c2c report: Add loads related dimension keys
perf c2c report: Add llc and remote loads related dimension keys
perf c2c report: Add llc load miss dimension key
perf c2c report: Add total record sort key
perf c2c report: Add total loads sort key
perf c2c report: Add hitm percent sort key
perf c2c report: Add hitm/store percent related sort keys
perf c2c report: Add dram related sort keys
perf c2c report: Add 'pid' sort key
perf c2c report: Add 'tid' sort key
perf c2c report: Add 'symbol' and 'dso' sort keys
perf c2c report: Add 'node' sort key
perf c2c report: Add stats related sort keys
perf c2c report: Add 'cpucnt' sort key
perf c2c report: Add src line sort key
perf c2c report: Setup number of header lines for hists
perf c2c report: Set final resort fields
perf c2c report: Add stdio output support
perf c2c report: Add main TUI browser
perf c2c report: Add TUI cacheline browser
perf c2c report: Add global stats stdio output
perf c2c report: Add shared cachelines stats stdio output
perf c2c report: Add c2c related stats stdio output
perf c2c report: Allow to report callchains
perf c2c report: Limit the cachelines table entries
perf c2c report: Add support to choose local HITMs
perf c2c report: Allow to set cacheline sort fields
perf c2c report: Recalc width of global sort entries
perf c2c report: Add cacheline index entry
perf c2c report: Add support to manage symbol name length
perf c2c report: Iterate node display in browser
perf c2c report: Add help windows
perf c2c: Add man page and credits
tools lib traceevent: Add install_headers target
tools lib traceevent: Add do_install_mkdir Makefile function
tools lib traceevent: Rename LIB_FILE to LIB_TARGET
tools lib traceevent: Add version for traceevent shared object
tools lib: Add for_each_clear_bit macro
perf report: Move captured info to generic header info
perf header: Display missing features
perf header: Display feature name on write failure
perf header: Set nr_numa_nodes only when we parsed all the data
perf c2c report: Add --no-source option
perf c2c report: Add --show-all option

Namhyung Kim (1):
perf top: Fix refreshing hierarchy entries on TUI

Wang Nan (1):
perf jevents: Handle events including .c and .o

tools/include/asm-generic/bitops.h | 1 +
tools/include/asm-generic/bitops/__ffz.h | 12 +
tools/include/asm-generic/bitops/find.h | 28 +
tools/include/linux/bitops.h | 5 +
tools/lib/find_bit.c | 25 +
tools/lib/traceevent/Makefile | 40 +-
tools/perf/Build | 1 +
tools/perf/Documentation/perf-c2c.txt | 282 ++
tools/perf/Documentation/perf-list.txt | 2 +-
tools/perf/MANIFEST | 1 +
tools/perf/arch/x86/entry/syscalls/syscall_64.tbl | 4 +-
tools/perf/builtin-c2c.c | 2754 ++++++++++++++++++++
tools/perf/builtin-list.c | 9 +-
tools/perf/builtin.h | 1 +
tools/perf/perf-completion.sh | 6 +-
tools/perf/perf.c | 1 +
tools/perf/ui/browsers/hists.c | 5 +-
tools/perf/ui/browsers/hists.h | 1 +
tools/perf/util/event.h | 3 +
tools/perf/util/header.c | 21 +-
tools/perf/util/hist.c | 1 +
tools/perf/util/hist.h | 1 +
tools/perf/util/intel-bts.c | 9 +-
.../perf/util/intel-pt-decoder/intel-pt-decoder.c | 2 +
.../perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 +
.../util/intel-pt-decoder/intel-pt-insn-decoder.c | 13 +-
.../util/intel-pt-decoder/intel-pt-insn-decoder.h | 6 +-
tools/perf/util/intel-pt-decoder/intel-pt-log.c | 4 +-
tools/perf/util/intel-pt.c | 19 +-
tools/perf/util/mem-events.c | 128 +
tools/perf/util/mem-events.h | 37 +
tools/perf/util/parse-events.c | 2 +-
tools/perf/util/parse-events.l | 4 +-
tools/perf/util/pmu.c | 14 +-
tools/perf/util/pmu.h | 3 +-
tools/perf/util/session.c | 10 -
tools/perf/util/sort.c | 2 +-
tools/perf/util/sort.h | 1 +
38 files changed, 3393 insertions(+), 66 deletions(-)
create mode 100644 tools/include/asm-generic/bitops/__ffz.h
create mode 100644 tools/perf/Documentation/perf-c2c.txt
create mode 100644 tools/perf/builtin-c2c.c

[root@jouet ~]# time dm
1 66.368836810 alpine:3.4: Ok
2 26.154146190 android-ndk:r12b-arm: Ok
3 69.746739126 archlinux:latest: Ok
4 39.624220291 centos:5: Ok
5 58.689782208 centos:6: Ok
6 69.851635081 centos:7: Ok
7 63.079827869 debian:7: Ok
8 68.955435266 debian:8: Ok
9 38.571431258 debian:experimental: Ok
10 69.558879497 fedora:20: Ok
11 73.092759654 fedora:21: Ok
12 72.443082285 fedora:22: Ok
13 72.305159323 fedora:23: Ok
14 77.316048256 fedora:24: Ok
15 32.774333511 fedora:24-x-ARC-uClibc: Ok
16 80.985293289 fedora:rawhide: Ok
17 79.388121697 mageia:5: Ok
18 72.485900821 opensuse:13.2: Ok
19 73.519405793 opensuse:42.1: Ok
20 81.367665352 opensuse:tumbleweed: Ok
21 56.263699207 ubuntu:12.04.5: Ok
22 38.300297066 ubuntu:14.04: Ok
23 68.467777551 ubuntu:14.04.4: Ok
24 70.120014470 ubuntu:15.10: Ok
25 69.392704717 ubuntu:16.04: Ok
26 68.643732518 ubuntu:16.04-x-arm: Ok
27 58.529762081 ubuntu:16.04-x-arm64: Ok
28 57.908570394 ubuntu:16.04-x-powerpc: Ok
29 58.354897750 ubuntu:16.04-x-powerpc64: Ok
30 60.598809333 ubuntu:16.04-x-powerpc64el: Ok
31 58.995355673 ubuntu:16.04-x-s390: Ok
32 74.705277358 ubuntu:16.10: Ok

real 33m47.198s
user 0m2.009s
sys 0m2.429s
[root@jouet ~]#

[acme@jouet linux]$ perf stat make -C tools/perf build-test
make: Entering directory '/home/acme/git/linux/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
make_util_pmu_bison_o_O: make util/pmu-bison.o
make_no_libelf_O: make NO_LIBELF=1
make_pure_O: make
make_no_libbionic_O: make NO_LIBBIONIC=1
make_no_libperl_O: make NO_LIBPERL=1
make_no_demangle_O: make NO_DEMANGLE=1
make_no_gtk2_O: make NO_GTK2=1
make_clean_all_O: make clean all
make_no_slang_O: make NO_SLANG=1
make_no_libnuma_O: make NO_LIBNUMA=1
make_debug_O: make DEBUG=1
make_util_map_o_O: make util/map.o
make_no_libpython_O: make NO_LIBPYTHON=1
make_no_newt_O: make NO_NEWT=1
make_no_scripts_O: make NO_LIBPYTHON=1 NO_LIBPERL=1
make_perf_o_O: make perf.o
make_install_prefix_O: make install prefix=/tmp/krava
make_install_bin_O: make install-bin
make_no_libdw_dwarf_unwind_O: make NO_LIBDW_DWARF_UNWIND=1
make_install_prefix_slash_O: make install prefix=/tmp/krava/
make_help_O: make help
make_no_backtrace_O: make NO_BACKTRACE=1
- /home/acme/git/linux/tools/pD_TEST_FEATURE_DUMP_STATIC: cd . && make FEATURE_DUMP_COPY=/home/acme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump
cd . && make FEATURE_DUMP_COPYcme/git/linux/tools/perf/BUILD_TEST_FEATURE_DUMP_STATIC LDFLAGS='-static' feature-dump
make_static_O: make LDFLAGS=-static
make_doc_O: make doc
make_no_auxtrace_O: make NO_AUXTRACE=1
make_no_ui_O: make NO_NEWT=1 NO_SLANG=1 NO_GTK2=1
make_tags_O: make tags
make_with_babeltrace_O: make LIBBABELTRACE=1
make_minimal_O: make NO_LIBPERL=1 NO_LIBPYTHON=1 NO_NEWT=1 NO_GTK2=1 NO_DEMANGLE=1 NO_LIBELF=1 NO_LIBUNWIND=1 NO_BACKTRACE=1 NO_LIBNUMA=1 NO_LIBAUDIT=1 NO_LIBBIONIC=1 NO_LIBDW_DWARF_UNWIND=1 NO_AUXTRACE=1 NO_LIBBPF=1 NO_LIBCRYPTO=1 NO_SDT=1
make_install_O: make install
make_no_libunwind_O: make NO_LIBUNWIND=1
make_no_libbpf_O: make NO_LIBBPF=1
make_no_libaudit_O: make NO_LIBAUDIT=1
OK
make: Leaving directory '/home/acme/git/linux/tools/perf'
[acme@jouet linux]$

[root@jouet ~]# perf test
1: vmlinux symtab matches kallsyms : Ok
2: detect openat syscall event : Ok
3: detect openat syscall event on all cpus : Ok
4: read samples using the mmap interface : Ok
5: parse events tests : Ok
6: Validate PERF_RECORD_* events & perf_sample fields : Ok
7: Test perf pmu format parsing : Ok
8: Test dso data read : Ok
9: Test dso data cache : Ok
10: Test dso data reopen : Ok
11: roundtrip evsel->name check : Ok
12: Check parsing of sched tracepoints fields : Ok
13: Generate and check syscalls:sys_enter_openat event fields: Ok
14: struct perf_event_attr setup : Ok
15: Test matching and linking multiple hists : Ok
16: Try 'import perf' in python, checking link problems : Ok
17: Test breakpoint overflow signal handler : Ok
18: Test breakpoint overflow sampling : Ok
19: Test number of exit event of a simple workload : Ok
20: Test software clock events have valid period values : Ok
21: Test object code reading : Ok
22: Test sample parsing : Ok
23: Test using a dummy software event to keep tracking : Ok
24: Test parsing with no sample_id_all bit set : Ok
25: Test filtering hist entries : Ok
26: Test mmap thread lookup : Ok
27: Test thread mg sharing : Ok
28: Test output sorting of hist entries : Ok
29: Test cumulation of child hist entries : Ok
30: Test tracking with sched_switch : Ok
31: Filter fds with revents mask in a fdarray : Ok
32: Add fd to a fdarray, making it autogrow : Ok
33: Test kmod_path__parse function : Ok
34: Test thread map : Ok
35: Test LLVM searching and compiling :
35.1: Basic BPF llvm compiling test : Ok
35.2: Test kbuild searching : Ok
35.3: Compile source for BPF prologue generation test : Ok
35.4: Compile source for BPF relocation test : Ok
36: Test topology in session : Ok
37: Test BPF filter :
37.1: Test basic BPF filtering : Ok
37.2: Test BPF prologue generation : Ok
37.3: Test BPF relocation checker : Ok
38: Test thread map synthesize : Ok
39: Test cpu map synthesize : Ok
40: Test stat config synthesize : Ok
41: Test stat synthesize : Ok
42: Test stat round synthesize : Ok
43: Test attr update synthesize : Ok
44: Test events times : Ok
45: Test backward reading from ring buffer : Ok
46: Test cpu map print : Ok
47: Test SDT event probing : Ok
48: Test is_printable_array function : Ok
49: Test bitmap print : Ok
50: x86 rdpmc test : Ok
51: Test converting perf time to TSC : Ok
52: Test dwarf unwind : Ok
53: Test x86 instruction decoder - new instructions : Ok
54: Test intel cqm nmi context read : Skip
[root@jouet ~]#