[GIT PULL] perf changes for v4.4

From: Ingo Molnar
Date: Tue Nov 03 2015 - 05:02:29 EST


Please pull the latest perf-core-for-linus git tree from:

git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf-core-for-linus

# HEAD: bebd23a2ed31d47e7dd746d3b125068aa2c42d85 Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Kernel side changes:

- Improve accuracy of perf/sched clock on x86. (Adrian Hunter)

- Intel DS and BTS updates. (Alexander Shishkin)

- Intel cstate PMU support. (Kan Liang)

- Add group read support to perf_event_read(). (Peter Zijlstra)

- Branch call hardware sampling support, implemented on x86 and PowerPC.
(Stephane Eranian)

- Event groups transactional interface enhancements. (Sukadev Bhattiprolu)

- Enable proper x86/intel/uncore PMU support on multi-segment PCI systems.
(Taku Izumi)

- ... misc fixes and cleanups.

The perf tooling team was very busy again with 200+ commits, the full diff doesn't
fit into lkml size limits. Here's an (incomplete) list of the tooling highlights:

New features:

- Change the default event used in all tools (record/top): use the most precise "cycles"
hw counter available, i.e. when the user doesn't specify any event, it will try using
cycles:ppp, cycles:pp, etc. and fall back transparently until it finds a working
counter. (Arnaldo Carvalho de Melo)

- Integration of perf with eBPF that, given an eBPF .c source file (or .o file
built for the 'bpf' target with clang), will get it automatically built, validated
and loaded into the kernel via the sys_bpf syscall, which can then be used and seen
using 'perf trace' and other tools.

(Wang Nan)

Various user interface improvements:

- Automatic pager invocation on long help output. (Namhyung Kim)

- Search for more options when passing args to -h, e.g.: (Arnaldo Carvalho de Melo)

$ perf report -h interface

Usage: perf report [<options>]

--gtk Use the GTK2 interface
--stdio Use the stdio interface
--tui Use the TUI interface

- Show ordered command line options when -h is used or when an
unknown option is specified. (Arnaldo Carvalho de Melo)

- If options are passed after -h, show just its descriptions, not
all options. (Arnaldo Carvalho de Melo)

- Implement column based horizontal scrolling in the hists browser (top, report),
making it possible to use the TUI for things like 'perf mem report' where
there are many more columns than can fit in a terminal. (Arnaldo Carvalho de Melo)

- Enhance the error reporting of tracepoint event parsing, e.g.:

$ oldperf record -e sched:sched_switc usleep 1
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint
Run 'perf list' for a list of valid events

Now we get the much nicer:

$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ can't access trace events

Error: No permissions to read /sys/kernel/debug/tracing/events/sched/sched_switc
Hint: Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'

And after we have those mount point permissions fixed:

$ perf record -e sched:sched_switc ls
event syntax error: 'sched:sched_switc'
\___ unknown tracepoint

Error: File /sys/kernel/debug/tracing/events/sched/sched_switc not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.

I.e. basically now the event parsing routing uses the strerror_open()
routines introduced by and used in 'perf trace' work. (Jiri Olsa)

- Fail properly when pattern matching fails to find a tracepoint, i.e.
'-e non:existent' was being correctly handled, with a proper error message
about that not being a valid event, but '-e non:existent*' wasn't,
fix it. (Jiri Olsa)

- Do event name substring search as last resort in 'perf list'.
(Arnaldo Carvalho de Melo)


# perf list clock

List of pre-defined events (to be used in -e):

cpu-clock [Software event]
task-clock [Software event]

uncore_cbox_0/clockticks/ [Kernel PMU event]
uncore_cbox_1/clockticks/ [Kernel PMU event]

kvm:kvm_pvclock_update [Tracepoint event]
kvm:kvm_update_master_clock [Tracepoint event]
power:clock_disable [Tracepoint event]
power:clock_enable [Tracepoint event]
power:clock_set_rate [Tracepoint event]
syscalls:sys_enter_clock_adjtime [Tracepoint event]
syscalls:sys_enter_clock_getres [Tracepoint event]
syscalls:sys_enter_clock_gettime [Tracepoint event]
syscalls:sys_enter_clock_nanosleep [Tracepoint event]
syscalls:sys_enter_clock_settime [Tracepoint event]
syscalls:sys_exit_clock_adjtime [Tracepoint event]
syscalls:sys_exit_clock_getres [Tracepoint event]
syscalls:sys_exit_clock_gettime [Tracepoint event]
syscalls:sys_exit_clock_nanosleep [Tracepoint event]
syscalls:sys_exit_clock_settime [Tracepoint event]

Intel PT hardware tracing enhancements:

- Accept a zero --itrace period, meaning "as often as possible". In the case
of Intel PT that is the same as a period of 1 and a unit of 'instructions'
(i.e. --itrace=i1i). (Adrian Hunter)

- Harmonize itrace's synthesized callchains with the existing --max-stack
tool option. (Adrian Hunter)

- Allow time to be displayed in nanoseconds in 'perf script'. (Adrian Hunter)

- Fix potential infinite loop when handling Intel PT timestamps. (Adrian Hunter)

- Slighly improve Intel PT debug logging. (Adrian Hunter)

- Warn when AUX data has been lost, just like when processing PERF_RECORD_LOST.
(Adrian Hunter)

- Further document export-to-postgresql.py script. (Adrian Hunter)

- Add option to synthesize branch stack from auxtrace data. (Adrian Hunter)

Misc notable changes:

- Switch the default callchain output mode to 'graph,0.5,caller', to make it
look like the default for other tools, reducing the learning curve for
people used to 'caller' based viewing. (Arnaldo Carvalho de Melo)

- various call chain usability enhancements. (Namhyung Kim)

- Introduce the 'P' event modifier, meaning 'max precision level, please', i.e.:

$ perf record -e cycles:P usleep 1

Is now similar to:

$ perf record usleep 1

Useful, for instance, when specifying multiple events. (Jiri Olsa)

- Add 'socket' sort entry, to sort by the processor socket in
'perf top' and 'perf report'. (Kan Liang)

- Introduce --socket-filter to 'perf report', for filtering by processor
socket. (Kan Liang)

- Add new "Zoom into Processor Socket" operation in the perf hists browser,
used in 'perf top' and 'perf report'. (Kan Liang)

- Allow probing on kmodules without DWARF. (Masami Hiramatsu)

- Fix 'perf probe -l' for probes added to kernel module functions. (Masami Hiramatsu)

- Preparatory work for the 'perf stat record' feature that will allow generating
perf.data files with counting data in addition to the sampling mode
we have now (Jiri Olsa)

- Update libtraceevent KVM plugin. (Paolo Bonzini)

- ... plus lots of other enhancements that I failed to list properly, by:
Adrian Hunter, Alexander Shishkin, Andi Kleen, Andrzej Hajda,
Arnaldo Carvalho de Melo, Dima Kogan, Don Zickus, Geliang Tang, He Kuang,
Huaitong Han, Ingo Molnar, Jan Stancek, Jiri Olsa, Kan Liang, Kirill Tkhai,
Masami Hiramatsu, Matt Fleming, Namhyung Kim, Paolo Bonzini, Peter Zijlstra,
Rabin Vincent, Scott Wood, Stephane Eranian, Sukadev Bhattiprolu, Taku Izumi,
Vaishali Thakkar, Wang Nan, Yang Shi and Yunlong Song.



[ 600K+ diff omitted due to lkml mail size limits. ]
