[PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

From: Leo Yan
Date: Thu Oct 15 2020 - 10:51:03 EST


If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
rely on HITM display to report cache false sharing. Alternatively, we
can use the LLC access and multi-threads info to locate the potential
false sharing's data address, and if we connect with source code and
analyze the multi-threads' execution timing, if can conclude load and
store the same cache line at the meantime, thus this can be helpful for
resolve the cache false sharing issue.

This patch set is to enable the display with sorting on LLC load
accesses; it adds dimensions for total LLC hit and LLC load accesses,
and these dimensions are used for shared cache line table and pareto.

This patch set is dependend on the patch set "perf c2c: Refine the
organization of metrics" [1].

[1] https://lore.kernel.org/patchwork/cover/1321499/

With this patch set, we can get display 'llc' as follows:

# perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio

[...]

=================================================
Shared Data Cache Line Table
=================================================
#
# ----------- Cacheline ---------- LLC Hit LLC Hit Total Total Total ---- Stores ---- ----- Core Load Hit ----- - LLC Load Hit -- - RMT Load Hit -- --- Load Dram ----
# Index Address Node PA cnt Pct Total records Loads Stores L1Hit L1Miss FB L1 L2 LclHit LclHitm RmtHit RmtHitm Lcl Rmt
# ..... .................. .... ...... ....... ........ ....... ....... ....... ....... ....... ....... ....... ....... ........ ....... ........ ....... ........ ........
#
0 0x563b01e83100 0 1401 65.32% 648 7011 3738 3273 2582 691 515 2516 59 143 505 0 0 0 0
1 0x563b01e830c0 0 1 26.51% 263 400 400 0 0 0 130 3 4 262 1 0 0 0 0
2 0x563b01e83080 0 1 7.76% 77 650 650 0 0 0 180 348 45 14 63 0 0 0 0
3 0xffff88c3d74e82c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0
4 0xffffa587c11e38c0 N/A 0 0.10% 1 2 1 1 1 0 0 0 0 1 0 0 0 0 0
5 0xffffffffbd5e6fc0 0 1 0.10% 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0
6 0x7f90a4d6c2c0 0 1 0.10% 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0

=================================================
Shared Cache Line Distribution Pareto
=================================================
#
# ---- LLC LD ---- -- Store Refs -- --------- Data address --------- ---------- cycles ---------- Total cpu Shared
# Num LclHit LclHitm L1 Hit L1 Miss Offset Node PA cnt Pid Tid Code address rmt hitm lcl hitm load records cnt Symbol Object Source:Line Node
# ..... ....... ....... ....... ....... .................. .... ...... ....... .................. .................. ........ ........ ........ ....... ........ ................... ................. ........................... ....
#
-------------------------------------------------------------
0 143 505 2582 691 0x563b01e83100
-------------------------------------------------------------
96.50% 7.72% 46.79% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c16 0 1949 1331 1876 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
0.00% 35.05% 0.00% 0.00% 0x0 0 1 14100 14102:lock_th 0x563b01c81c1d 0 2651 975 748 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 30.89% 0.00% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c1d 0 1425 1003 762 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
2.10% 7.52% 49.19% 0.00% 0x0 0 1 14100 14103:lock_th 0x563b01c81c16 0 1585 1053 2037 1 [.] read_write_func false_sharing.exe false_sharing_example.c:145 0
0.00% 0.00% 2.52% 44.86% 0x0 0 1 14100 14102:lock_th 0x563b01c81c28 0 0 0 375 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
0.00% 0.00% 1.51% 55.14% 0x0 0 1 14100 14103:lock_th 0x563b01c81c28 0 0 0 420 1 [.] read_write_func false_sharing.exe false_sharing_example.c:146 0
1.40% 12.87% 0.00% 0.00% 0x20 0 1 14100 14104:reader_thd 0x563b01c81c73 0 166 99 417 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0
0.00% 5.94% 0.00% 0.00% 0x20 0 1 14100 14105:reader_thd 0x563b01c81c73 0 144 85 376 1 [.] read_write_func false_sharing.exe false_sharing_example.c:155 0

[...]


Leo Yan (8):
perf mem: Add structure field c2c_stats::tot_llchit
perf c2c: Add dimensions for total LLC hit
perf c2c: Add dimensions for LLC load hit
perf c2c: Change to general naming for macros
perf c2c: Rename for shared cache line stats
perf c2c: Refactor hist entry validation
perf c2c: Add option '-d llc' for sorting with LLC load
perf c2c: Update documentation for display option 'llc'

tools/perf/Documentation/perf-c2c.txt | 18 +-
tools/perf/builtin-c2c.c | 333 +++++++++++++++++++++-----
tools/perf/util/mem-events.c | 3 +
tools/perf/util/mem-events.h | 1 +
4 files changed, 286 insertions(+), 69 deletions(-)

--
2.17.1