Re: [PATCH v3] perf stat: Show percore counts in per CPU output

From: Ravi Bangoria
Date: Fri Feb 14 2020 - 00:17:06 EST




On 2/13/20 8:40 PM, Jin, Yao wrote:


On 2/13/2020 9:20 PM, Ravi Bangoria wrote:
Hi Jin,

On 2/13/20 12:45 PM, Jin Yao wrote:
With this patch, for example,

 # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1

ÂÂ Performance counter stats for 'system wide':

 CPU0 2,453,061 cpu/event=cpu-cycles,percore/
 CPU1 1,823,921 cpu/event=cpu-cycles,percore/
 CPU2 1,383,166 cpu/event=cpu-cycles,percore/
 CPU3 1,102,652 cpu/event=cpu-cycles,percore/
 CPU4 2,453,061 cpu/event=cpu-cycles,percore/
 CPU5 1,823,921 cpu/event=cpu-cycles,percore/
 CPU6 1,383,166 cpu/event=cpu-cycles,percore/
 CPU7 1,102,652 cpu/event=cpu-cycles,percore/

We can see counts are duplicated in CPU pairs
(CPU0/CPU4, CPU1/CPU5, CPU2/CPU6, CPU3/CPU7).


I was trying this patch and I am getting bit weird results when any cpu
is offline. Ex,

ÂÂ $ lscpu | grep list
ÂÂ On-line CPU(s) list:ÂÂÂÂÂÂÂÂÂÂÂÂ 0-4,6,7
ÂÂ Off-line CPU(s) list:ÂÂÂÂÂÂÂÂÂÂÂ 5

ÂÂ $ sudo ./perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -vv -- sleep 1
ÂÂÂÂ ...
ÂÂ cpu/event=cpu-cycles,percore/: 0: 23746491 1001189836 1001189836
ÂÂ cpu/event=cpu-cycles,percore/: 1: 19802666 1001291299 1001291299
ÂÂ cpu/event=cpu-cycles,percore/: 2: 24211983 1001394318 1001394318
ÂÂ cpu/event=cpu-cycles,percore/: 3: 54051396 1001516816 1001516816
ÂÂ cpu/event=cpu-cycles,percore/: 4: 6378825 1001064048 1001064048
ÂÂ cpu/event=cpu-cycles,percore/: 5: 21299840 1001166297 1001166297
ÂÂ cpu/event=cpu-cycles,percore/: 6: 13075410 1001274535 1001274535
ÂÂÂ Performance counter stats for 'system wide':
ÂÂ CPU0ÂÂÂÂÂÂÂÂÂÂÂÂÂ 30,125,316ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ CPU1ÂÂÂÂÂÂÂÂÂÂÂÂÂ 19,802,666ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ CPU2ÂÂÂÂÂÂÂÂÂÂÂÂÂ 45,511,823ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ CPU3ÂÂÂÂÂÂÂÂÂÂÂÂÂ 67,126,806ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ CPU4ÂÂÂÂÂÂÂÂÂÂÂÂÂ 30,125,316ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ CPU7ÂÂÂÂÂÂÂÂÂÂÂÂÂ 67,126,806ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ CPU0ÂÂÂÂÂÂÂÂÂÂÂÂÂ 30,125,316ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂÂÂÂÂÂÂÂ 1.001918764 seconds time elapsed

I see proper result without --percore-show-thread:

ÂÂ $ sudo ./perf stat -e cpu/event=cpu-cycles,percore/ -a -A -vv -- sleep 1
ÂÂÂÂ ...
ÂÂ cpu/event=cpu-cycles,percore/: 0: 11676414 1001190709 1001190709
ÂÂ cpu/event=cpu-cycles,percore/: 1: 39119617 1001291459 1001291459
ÂÂ cpu/event=cpu-cycles,percore/: 2: 41821512 1001391158 1001391158
ÂÂ cpu/event=cpu-cycles,percore/: 3: 46853730 1001492799 1001492799
ÂÂ cpu/event=cpu-cycles,percore/: 4: 14448274 1001095948 1001095948
ÂÂ cpu/event=cpu-cycles,percore/: 5: 42238217 1001191187 1001191187
ÂÂ cpu/event=cpu-cycles,percore/: 6: 33129641 1001292072 1001292072
ÂÂÂ Performance counter stats for 'system wide':
ÂÂ S0-D0-C0ÂÂÂÂÂÂÂÂÂÂÂÂ 26,124,688ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ S0-D0-C1ÂÂÂÂÂÂÂÂÂÂÂÂ 39,119,617ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ S0-D0-C2ÂÂÂÂÂÂÂÂÂÂÂÂ 84,059,729ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂ S0-D0-C3ÂÂÂÂÂÂÂÂÂÂÂÂ 79,983,371ÂÂÂÂÂ cpu/event=cpu-cycles,percore/
ÂÂÂÂÂÂÂÂÂ 1.001961563 seconds time elapsed

[...]


Thanks so much for reporting this issue!

It looks I should use the cpu idx in print_percore_thread. I can't use the cpu value. I have a fix:

diff --git a/tools/perf/util/stat-display.c b/tools/perf/util/stat-display.c
index 7eb3643a97ae..d89cb0da90f8 100644
--- a/tools/perf/util/stat-display.c
+++ b/tools/perf/util/stat-display.c
@@ -1149,13 +1149,11 @@ static void print_footer(struct perf_stat_config *config)
Âstatic void print_percore_thread(struct perf_stat_config *config,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ struct evsel *counter, char *prefix)
Â{
-ÂÂÂÂÂÂ int cpu, s, s2, id;
+ÂÂÂÂÂÂ int s, s2, id;
ÂÂÂÂÂÂÂ bool first = true;

ÂÂÂÂÂÂÂ for (int i = 0; i < perf_evsel__nr_cpus(counter); i++) {
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ cpu = perf_cpu_map__cpu(evsel__cpus(counter), i);
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ s2 = config->aggr_get_id(config, evsel__cpus(counter), cpu);
-
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ s2 = config->aggr_get_id(config, evsel__cpus(counter), i);
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ for (s = 0; s < config->aggr_map->nr; s++) {
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ id = config->aggr_map->map[s];
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ if (s2 == id)
@@ -1164,7 +1162,7 @@ static void print_percore_thread(struct perf_stat_config *config,

ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ print_counter_aggrdata(config, counter, s,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ prefix, false,
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ &first, cpu);
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ &first, i);
ÂÂÂÂÂÂÂ }
Â}

LGTM.

Tested-by: Ravi Bangoria <ravi.bangoria@xxxxxxxxxxxxx>

Ravi