Re: [PATCH 5/5] perf stat: Add topdown metrics in the default perf stat on the hybrid machine

From: Namhyung Kim
Date: Wed Jun 08 2022 - 20:10:10 EST


On Tue, Jun 7, 2022 at 1:08 AM <zhengjun.xing@xxxxxxxxxxxxxxx> wrote:
>
> From: Zhengjun Xing <zhengjun.xing@xxxxxxxxxxxxxxx>
>
> Topdown metrics are missed in the default perf stat on the hybrid machine,
> add Topdown metrics in default perf stat for hybrid systems.
>
> Currently, we support the perf metrics Topdown for the p-core PMU in the
> perf stat default, the perf metrics Topdown support for e-core PMU will be
> implemented later separately. Refactor the code adds two x86 specific
> functions. Widen the size of the event name column by 7 chars, so that all
> metrics after the "#" become aligned again.
>
> The perf metrics topdown feature is supported on the cpu_core of ADL. The
> dedicated perf metrics counter and the fixed counter 3 are used for the
> topdown events. Adding the topdown metrics doesn't trigger multiplexing.
>
> Before:
>
> # ./perf stat -a true
>
> Performance counter stats for 'system wide':
>
> 53.70 msec cpu-clock # 25.736 CPUs utilized
> 80 context-switches # 1.490 K/sec
> 24 cpu-migrations # 446.951 /sec
> 52 page-faults # 968.394 /sec
> 2,788,555 cpu_core/cycles/ # 51.931 M/sec
> 851,129 cpu_atom/cycles/ # 15.851 M/sec
> 2,974,030 cpu_core/instructions/ # 55.385 M/sec
> 416,919 cpu_atom/instructions/ # 7.764 M/sec
> 586,136 cpu_core/branches/ # 10.916 M/sec
> 79,872 cpu_atom/branches/ # 1.487 M/sec
> 14,220 cpu_core/branch-misses/ # 264.819 K/sec
> 7,691 cpu_atom/branch-misses/ # 143.229 K/sec
>
> 0.002086438 seconds time elapsed
>
> After:
>
> # ./perf stat -a true
>
> Performance counter stats for 'system wide':
>
> 61.39 msec cpu-clock # 24.874 CPUs utilized
> 76 context-switches # 1.238 K/sec
> 24 cpu-migrations # 390.968 /sec
> 52 page-faults # 847.097 /sec
> 2,753,695 cpu_core/cycles/ # 44.859 M/sec
> 903,899 cpu_atom/cycles/ # 14.725 M/sec
> 2,927,529 cpu_core/instructions/ # 47.690 M/sec
> 428,498 cpu_atom/instructions/ # 6.980 M/sec
> 581,299 cpu_core/branches/ # 9.470 M/sec
> 83,409 cpu_atom/branches/ # 1.359 M/sec
> 13,641 cpu_core/branch-misses/ # 222.216 K/sec
> 8,008 cpu_atom/branch-misses/ # 130.453 K/sec
> 14,761,308 cpu_core/slots/ # 240.466 M/sec
> 3,288,625 cpu_core/topdown-retiring/ # 22.3% retiring
> 1,323,323 cpu_core/topdown-bad-spec/ # 9.0% bad speculation
> 5,477,470 cpu_core/topdown-fe-bound/ # 37.1% frontend bound
> 4,679,199 cpu_core/topdown-be-bound/ # 31.7% backend bound
> 646,194 cpu_core/topdown-heavy-ops/ # 4.4% heavy operations # 17.9% light operations
> 1,244,999 cpu_core/topdown-br-mispredict/ # 8.4% branch mispredict # 0.5% machine clears
> 3,891,800 cpu_core/topdown-fetch-lat/ # 26.4% fetch latency # 10.7% fetch bandwidth
> 1,879,034 cpu_core/topdown-mem-bound/ # 12.7% memory bound # 19.0% Core bound
>
> 0.002467839 seconds time elapsed
>
> Signed-off-by: Zhengjun Xing <zhengjun.xing@xxxxxxxxxxxxxxx>
> Reviewed-by: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>
> ---
[SNIP]
> +const char *arch_get_topdown_pmu_name(struct evlist *evlist, bool warn)
> +{
> + const char *pmu_name = "cpu";
> +
> + if (perf_pmu__has_hybrid()) {
> + if (!evlist->hybrid_pmu_name) {
> + if (warn)
> + pr_warning
> + ("WARNING: default to use cpu_core topdown events\n");
> + evlist->hybrid_pmu_name =
> + perf_pmu__hybrid_type_to_pmu("core");

This doesn't look good. Please consider reducing the
indent level like returning early as

if (!perf_pmu__has_hybrid())
return "cpu";

if (!evlist->hybrid_pmu_name) {
...

Thanks,
Namhyung


> + }
> +
> + pmu_name = evlist->hybrid_pmu_name;
> + }
> + return pmu_name;
> +}