Re: [GIT PULL 00/52] New Tool: perf c2c

From: Ingo Molnar
Date: Sat Oct 22 2016 - 04:28:47 EST



* Arnaldo Carvalho de Melo <acme@xxxxxxxxxx> wrote:

> Hi Ingo,
>
> Please consider pulling into tip/perf/core,
>
> Thanks,
>
> - Arnaldo
>
> The following changes since commit 10b37cb59fa1e61fec1386f324615e0e8202cd87:
>
> Merge tag 'perf-vendor_events-for-mingo-20161018' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core (2016-10-19 15:22:26 +0200)
>
> are available in the git repository at:
>
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-c2c-for-mingo-20161020
>
> for you to fetch changes up to 535bbde62701b2bb298063e9dfa007e8a1ff95d1:
>
> perf c2c report: Add --show-all option (2016-10-19 13:18:31 -0300)
>
> ----------------------------------------------------------------
> - The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis.
>
> It allows you to track down cacheline contention. The tool is based
> on x86's load latency and precise store facility events provided by
> Intel CPUs.
>
> It was tested by Joe Mario and has proven to be useful, finding some
> cacheline contentions. Joe also wrote a blog about c2c tool with
> examples:
>
> https://joemario.github.io/blog/2016/09/01/c2c-blog/
>
> Excerpt of the content on this site:
>
> ---
> At a high level, âperf c2câ will show you:
>
> * The cachelines where false sharing was detected.
> * The readers and writers to those cachelines, and the offsets where those accesses occurred.
> * The pid, tid, instruction addr, function name, binary object name for those readers and writers.
> * The source file and line number for each reader and writer.
> * The average load latency for the loads to those cachelines.
> * Which numa nodes the samples a cacheline came from and which CPUs were involved.
>
> Using perf c2c is similar to using the Linux perf tool today.
> First collect data with âperf c2c recordâ Then generate a report output with âperf c2c reportâ
> ---
>
> There one finds extensive details on using the tool, with tips on
> reducing the volume of samples while still capturing enough to do
> its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa)
>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
>
> ----------------------------------------------------------------
> Jiri Olsa (52):
> perf c2c: Introduce c2c_decode_stats function
> perf c2c: Introduce c2c_add_stats function
> perf c2c: Add c2c command
> perf c2c: Add record subcommand
> perf c2c: Add report subcommand
> perf c2c report: Add dimension support
> perf c2c report: Add sort_entry dimension support
> perf c2c report: Fallback to standard dimensions
> perf c2c report: Add sample processing
> perf c2c report: Add cacheline hists processing
> perf c2c report: Decode c2c_stats for hist entries
> perf c2c report: Add header macros
> perf c2c report: Add 'dcacheline' dimension key
> perf c2c report: Add 'offset' dimension key
> perf c2c report: Add 'iaddr' dimension key
> perf c2c report: Add hitm related dimension keys
> perf c2c report: Add stores related dimension keys
> perf c2c report: Add loads related dimension keys
> perf c2c report: Add llc and remote loads related dimension keys
> perf c2c report: Add llc load miss dimension key
> perf c2c report: Add total record sort key
> perf c2c report: Add total loads sort key
> perf c2c report: Add hitm percent sort key
> perf c2c report: Add hitm/store percent related sort keys
> perf c2c report: Add dram related sort keys
> perf c2c report: Add 'pid' sort key
> perf c2c report: Add 'tid' sort key
> perf c2c report: Add 'symbol' and 'dso' sort keys
> perf c2c report: Add 'node' sort key
> perf c2c report: Add stats related sort keys
> perf c2c report: Add 'cpucnt' sort key
> perf c2c report: Add src line sort key
> perf c2c report: Setup number of header lines for hists
> perf c2c report: Set final resort fields
> perf c2c report: Add stdio output support
> perf c2c report: Add main TUI browser
> perf c2c report: Add TUI cacheline browser
> perf c2c report: Add global stats stdio output
> perf c2c report: Add shared cachelines stats stdio output
> perf c2c report: Add c2c related stats stdio output
> perf c2c report: Allow to report callchains
> perf c2c report: Limit the cachelines table entries
> perf c2c report: Add support to choose local HITMs
> perf c2c report: Allow to set cacheline sort fields
> perf c2c report: Recalc width of global sort entries
> perf c2c report: Add cacheline index entry
> perf c2c report: Add support to manage symbol name length
> perf c2c report: Iterate node display in browser
> perf c2c report: Add help windows
> perf c2c: Add man page and credits
> perf c2c report: Add --no-source option
> perf c2c report: Add --show-all option
>
> tools/perf/Build | 1 +
> tools/perf/Documentation/perf-c2c.txt | 282 ++++
> tools/perf/builtin-c2c.c | 2754 +++++++++++++++++++++++++++++++++
> tools/perf/builtin.h | 1 +
> tools/perf/perf.c | 1 +
> tools/perf/ui/browsers/hists.c | 2 +-
> tools/perf/ui/browsers/hists.h | 1 +
> tools/perf/util/hist.c | 1 +
> tools/perf/util/hist.h | 1 +
> tools/perf/util/mem-events.c | 128 ++
> tools/perf/util/mem-events.h | 37 +
> tools/perf/util/sort.c | 2 +-
> tools/perf/util/sort.h | 1 +
> 13 files changed, 3210 insertions(+), 2 deletions(-)
> create mode 100644 tools/perf/Documentation/perf-c2c.txt
> create mode 100644 tools/perf/builtin-c2c.c

Pulled the perf-c2c-for-mingo-20161021 tag, thanks a lot Arnaldo!

I can see some teething problems. For example if I run it on an older kernel (v4.4
distro kernel), I get this:

triton:~/tip> perf c2c record perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes

Total time: 12.001 [sec]

12.001919 usecs/op
83320 ops/sec
[ perf record: Woken up 18 times to write data ]
[ perf record: Captured and wrote 5.356 MB perf.data (69804 samples) ]

but there's no 'perf c2c report' TUI output at all:

Shared Data Cache Line Table (0 entries, sorted on remote HITMs)
Total Rmt ----- LLC Load Hitm ----- ---- Store Reference ---- --- Load Dram ---- LLC Total ----- Core Load Hit ----- -- LLC Load Hit -
Index Cacheline records Hitm Total Lcl Rmt Total L1Hit L1Miss Lcl Rmt Ld Miss Loads FB L1 L2 Llc Rm

and just an empty screen.

If I do 'perf report' I get two events:

Available samples
24K cpu/mem-loads,ldlat=30/P
45K cpu/mem-stores/P

and both have some real data.

What am I missing?

Ingo