Re: Issue of metrics for multiple uncore PMUs (was Re: [RFC PATCH v2 23/23] perf metricgroup: remove duped metric group events)
From: Ian Rogers
Date: Fri Oct 02 2020 - 16:47:08 EST
On Fri, Oct 2, 2020 at 5:00 AM John Garry <john.garry@xxxxxxxxxx> wrote:
>
> On 07/05/2020 15:08, Ian Rogers wrote:
>
> Hi Ian,
>
> I was wondering if you ever tested commit 2440689d62e9 ("perf
> metricgroup: Remove duped metric group events") for when we have a
> metric which aliases multiple instances of the same uncore PMU in the
> system?
Sorry for this, I hadn't tested such a metric and wasn't aware of how
the aliasing worked. I sent a fix for this issue here:
https://lore.kernel.org/lkml/20200917201807.4090224-1-irogers@xxxxxxxxxx/
Could you see if this addresses the issue for you? I don't see the
change in Arnaldo's trees yet.
Thanks,
Ian
> I have been rebasing some of my arm64 perf work to v5.9-rc7, and find an
> issue where find_evsel_group() fails for the uncore metrics under the
> condition mentioned above.
>
> Unfortunately I don't have an x86 machine to which this test applies.
> However, as an experiment, I added a test metric to my broadwell JSON:
>
> diff --git a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
> b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
> index 8cdc7c13dc2a..fc6d9adf996a 100644
> --- a/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
> +++ b/tools/perf/pmu-events/arch/x86/broadwell/bdw-metrics.json
> @@ -348,5 +348,11 @@
> "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
> "MetricGroup": "Power",
> "MetricName": "C7_Pkg_Residency"
> + },
> + {
> + "BriefDescription": "test metric",
> + "MetricExpr": "UNC_CBO_XSNP_RESPONSE.MISS_XCORE *
> UNC_CBO_XSNP_RESPONSE.MISS_EVICTION",
> + "MetricGroup": "Test",
> + "MetricName": "test_metric_inc"
> }
> ]
>
>
> And get this:
>
> john@localhost:~/linux/tools/perf> sudo ./perf stat -v -M
> test_metric_inc sleep 1
> Using CPUID GenuineIntel-6-3D-4
> metric expr unc_cbo_xsnp_response.miss_xcore *
> unc_cbo_xsnp_response.miss_eviction for test_metric_inc
> found event unc_cbo_xsnp_response.miss_eviction
> found event unc_cbo_xsnp_response.miss_xcore
> adding
> {unc_cbo_xsnp_response.miss_eviction,unc_cbo_xsnp_response.miss_xcore}:W
> unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_1/umask=0x81,event=0x22/
> unc_cbo_xsnp_response.miss_eviction -> uncore_cbox_0/umask=0x81,event=0x22/
> unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_1/umask=0x41,event=0x22/
> unc_cbo_xsnp_response.miss_xcore -> uncore_cbox_0/umask=0x41,event=0x22/
> Cannot resolve test_metric_inc: unc_cbo_xsnp_response.miss_xcore *
> unc_cbo_xsnp_response.miss_eviction
> task-clock: 688876 688876 688876
> context-switches: 2 688876 688876
> cpu-migrations: 0 688876 688876
> page-faults: 69 688876 688876
> cycles: 2101719 695690 695690
> instructions: 1180534 695690 695690
> branches: 249450 695690 695690
> branch-misses: 10815 695690 695690
>
> Performance counter stats for 'sleep 1':
>
> 0.69 msec task-clock # 0.001 CPUs
> utilized
> 2 context-switches # 0.003 M/sec
>
> 0 cpu-migrations # 0.000 K/sec
>
> 69 page-faults # 0.100 M/sec
>
> 2,101,719 cycles # 3.051 GHz
>
> 1,180,534 instructions # 0.56 insn per
> cycle
> 249,450 branches # 362.112 M/sec
>
> 10,815 branch-misses # 4.34% of all
> branches
>
> 1.001177693 seconds time elapsed
>
> 0.001149000 seconds user
> 0.000000000 seconds sys
>
>
> john@localhost:~/linux/tools/perf>
>
>
> Any idea what is going wrong here, before I have to dive in? The issue
> seems to be this named commit.
>
> Thanks,
> John
>
> > A metric group contains multiple metrics. These metrics may use the same
> > events. If metrics use separate events then it leads to more
> > multiplexing and overall metric counts fail to sum to 100%.
> > Modify how metrics are associated with events so that if the events in
> > an earlier group satisfy the current metric, the same events are used.
> > A record of used events is kept and at the end of processing unnecessary
> > events are eliminated.
> >
> > Before: