Re: [PATCH 43/49] perf stat: Add default hybrid events

From: Jin, Yao
Date: Mon Feb 08 2021 - 19:38:53 EST


Hi Arnaldo,

On 2/9/2021 3:10 AM, Arnaldo Carvalho de Melo wrote:
Em Mon, Feb 08, 2021 at 07:25:40AM -0800, kan.liang@xxxxxxxxxxxxxxx escreveu:
From: Jin Yao <yao.jin@xxxxxxxxxxxxxxx>

Previously if '-e' is not specified in perf stat, some software events
and hardware events are added to evlist by default.

root@otcpl-adl-s-2:~# ./perf stat -- ./triad_loop

Performance counter stats for './triad_loop':

109.43 msec task-clock # 0.993 CPUs utilized
1 context-switches # 0.009 K/sec
0 cpu-migrations # 0.000 K/sec
105 page-faults # 0.960 K/sec
401,161,982 cycles # 3.666 GHz
1,601,216,357 instructions # 3.99 insn per cycle
200,217,751 branches # 1829.686 M/sec
14,555 branch-misses # 0.01% of all branches

0.110176860 seconds time elapsed

Among the events, cycles, instructions, branches and branch-misses
are hardware events.

One hybrid platform, two events are created for one hardware event.

core cycles,
atom cycles,
core instructions,
atom instructions,
core branches,
atom branches,
core branch-misses,
atom branch-misses

These events will be added to evlist in order on hybrid platform
if '-e' is not set.

Since parse_events() has been supported to create two hardware events
for one event on hybrid platform, so we just use parse_events(evlist,
"cycles,instructions,branches,branch-misses") to create the default
events and add them to evlist.

After:
root@otcpl-adl-s-2:~# ./perf stat -vv -- taskset -c 16 ./triad_loop
...
------------------------------------------------------------
perf_event_attr:
type 1
size 120
config 0x1
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
type 1
size 120
config 0x3
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 4
------------------------------------------------------------
perf_event_attr:
type 1
size 120
config 0x4
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 1
size 120
config 0x2
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 7
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0x400000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 8
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0xa00000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 9
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0x400000001
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 10
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0xa00000001
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 11
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0x400000004
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 12
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0xa00000004
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 13
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0x400000005
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid 27954 cpu -1 group_fd -1 flags 0x8 = 14
------------------------------------------------------------
perf_event_attr:
type 6
size 120
config 0xa00000005
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
...

Performance counter stats for 'taskset -c 16 ./triad_loop':

201.31 msec task-clock # 0.997 CPUs utilized
1 context-switches # 0.005 K/sec
1 cpu-migrations # 0.005 K/sec
166 page-faults # 0.825 K/sec
623,267,134 cycles # 3096.043 M/sec (0.16%)
603,082,383 cycles # 2995.777 M/sec (99.84%)
406,410,481 instructions # 2018.820 M/sec (0.16%)
1,604,213,375 instructions # 7968.837 M/sec (99.84%)
81,444,171 branches # 404.569 M/sec (0.16%)
200,616,430 branches # 996.550 M/sec (99.84%)
3,769,856 branch-misses # 18.727 M/sec (0.16%)
16,111 branch-misses # 0.080 M/sec (99.84%)

0.201895853 seconds time elapsed

We can see two events are created for one hardware event.
First one is core event the second one is atom event.

Can we have that (core/atom) as a prefix or in the comment area?


In next patch "perf stat: Uniquify hybrid event name", it would tell user the pmu which the event belongs to.

For example, I run the triad_loop on core cpu,

root@ssp-pwrt-002:# ./perf stat -- taskset -c 0 ./triad_loop

Performance counter stats for 'taskset -c 0 ./triad_loop':

287.87 msec task-clock # 0.990 CPUs utilized
30 context-switches # 0.104 K/sec
1 cpu-migrations # 0.003 K/sec
168 page-faults # 0.584 K/sec
450,089,808 cycles [cpu_core] # 1563.496 M/sec
<not counted> cycles [cpu_atom] (0.00%)
1,602,536,074 instructions [cpu_core] # 5566.797 M/sec
<not counted> instructions [cpu_atom] (0.00%)
200,474,560 branches [cpu_core] # 696.397 M/sec
<not counted> branches [cpu_atom] (0.00%)
23,002 branch-misses [cpu_core] # 0.080 M/sec
<not counted> branch-misses [cpu_atom] (0.00%)

We can see cpu_atom is not counted.

Thanks
Jin Yao

One thing is, the shadow stats looks a bit different, now it's just
'M/sec'.

The perf_stat__update_shadow_stats and perf_stat__print_shadow_stats
need to be improved in future if we want to get the original shadow
stats.

Reviewed-by: Andi Kleen <ak@xxxxxxxxxxxxxxx>
Signed-off-by: Jin Yao <yao.jin@xxxxxxxxxxxxxxx>
---
tools/perf/builtin-stat.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 44d1a5f..0b08665 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1137,6 +1137,13 @@ static int parse_hybrid_type(const struct option *opt,
return 0;
}
+static int add_default_hybrid_events(struct evlist *evlist)
+{
+ struct parse_events_error err;
+
+ return parse_events(evlist, "cycles,instructions,branches,branch-misses", &err);
+}
+
static struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", &transaction_run,
"hardware transaction statistics"),
@@ -1613,6 +1620,12 @@ static int add_default_attributes(void)
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
};
+ struct perf_event_attr default_sw_attrs[] = {
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
+ { .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
+};
/*
* Detailed stats (-d), covering the L1 and last level data caches:
@@ -1849,6 +1862,15 @@ static int add_default_attributes(void)
}
if (!evsel_list->core.nr_entries) {
+ perf_pmu__scan(NULL);
+ if (perf_pmu__hybrid_exist()) {
+ if (evlist__add_default_attrs(evsel_list,
+ default_sw_attrs) < 0) {
+ return -1;
+ }
+ return add_default_hybrid_events(evsel_list);
+ }
+
if (target__has_cpu(&target))
default_attrs0[0].config = PERF_COUNT_SW_CPU_CLOCK;
--
2.7.4