Re: [PATCH v2 00/11] perf c2c: Sort cacheline with all loads

From: Leo Yan
Date: Sun Jan 03 2021 - 21:48:53 EST


On Sun, Jan 03, 2021 at 11:52:19PM +0100, Jiri Olsa wrote:
> On Sun, Dec 13, 2020 at 01:38:39PM +0000, Leo Yan wrote:
> > This patch set is to sort cache line for all load operations which hit
> > any cache levels. For single cache line view, it shows the load
> > references for loads with cache hits and with cache misses respectively.
> >
> > This series is a following for the old patch set "perf c2c: Sort
> > cacheline with LLC load" [1], in the old patch set it tries to sort
> > cache line with the load operations in last level cache (LLC), after
> > testing we found the trace data doesn't contain LLC events if the
> > platform isn't a NUMA system. For this reason, this series refines the
> > implementation to sort on all cache levels hits of load operations; it's
> > reasonable for us to review the load and store opreations, if detects
> > any cache line is accessed by multi-threads, this hints that the cache
> > line is possible for false sharing.
> >
> > This patch set is clearly applied on perf/core branch with the latest
> > commit db0ea13cc741 ("perf evlist: Use the right prefix for 'struct
> > evlist' record methods"). And the changes has been tested on x86 and
> > Arm64, the testing result is shown as below.
>
> SNIP
>
> >
> >
> > [...]
> >
> > Changes from v1:
> > * Changed from sorting on LLC to sorting on all loads with cache hits;
> > * Added patches 06/11, 07/11 for refactoring macros;
> > * Added patch 08/11 for refactoring node header, so can display "%loads"
> > rather than "%hitms" in the header;
> > * Added patch 09/11 to add local pointers for pointing to output metrics
> > string and sort string (Juri);
> > * Added warning in percent_hitm() for the display "all", which should
> > never happen (Juri).
> >
> > [1] https://lore.kernel.org/patchwork/cover/1321514/
> >
> >
> > Leo Yan (11):
> > perf c2c: Add dimensions for total load hit
> > perf c2c: Add dimensions for load hit
> > perf c2c: Add dimensions for load miss
> > perf c2c: Rename for shared cache line stats
> > perf c2c: Refactor hist entry validation
> > perf c2c: Refactor display filter macro
> > perf c2c: Refactor node display macro
> > perf c2c: Refactor node header
> > perf c2c: Add local variables for output metrics
> > perf c2c: Sort on all cache hit for load operations
> > perf c2c: Update documentation for display option 'all'
> >
> > tools/perf/Documentation/perf-c2c.txt | 21 +-
> > tools/perf/builtin-c2c.c | 548 ++++++++++++++++++++++----
> > 2 files changed, 487 insertions(+), 82 deletions(-)
>
> Joe might want to test it first, but it looks all good to me:
>
> Acked-by: Jiri Olsa <jolsa@xxxxxxxxxx>

Thanks for the review, Jiri.

Note, after testing with Arm SPE, we found the store operations don't
contain the information for L1 cache hit or miss, this leads to there
have no statistics for "st_l1hit" and "st_l1miss"; finally the single
cache line view only can show the load samples and fails to show store
opreations due to the empty statistics for "st_l1hit" and "st_l1miss".

This is related the hardware issue, after some discussion internally,
so far cannot find a easy way to set memory flag for L1 cache hit or
miss for store operations (more specific, set flags PERF_MEM_LVL_HIT or
PERF_MEM_LVL_MISS for store's L1 cache accessing).

Given it is uncertain for this issue, please hold on for this patch
series and I will resend if have any conclusion.

And really sorry I notify this too late.

Thanks,
Leo