Re: [PATCH] perf stat: Show percore counts in per CPU output

From: Jin, Yao
Date: Mon Feb 10 2020 - 08:46:53 EST




On 2/10/2020 9:28 PM, Jiri Olsa wrote:
On Thu, Feb 06, 2020 at 09:56:13AM +0800, Jin Yao wrote:
We have supported the event modifier "percore" which sums up the
event counts for all hardware threads in a core and show the counts
per core.

For example,

# perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1

Performance counter stats for 'system wide':

S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/

This patch provides a new option "--percore-show-thread". It is
used with event modifier "percore" together to sum up the event counts
for all hardware threads in a core but show the counts per hardware
thread.

For example,

# perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1

Performance counter stats for 'system wide':

CPU0 2,453,061 cpu/event=cpu-cycles,percore/
CPU1 1,823,921 cpu/event=cpu-cycles,percore/
CPU2 1,383,166 cpu/event=cpu-cycles,percore/
CPU3 1,102,652 cpu/event=cpu-cycles,percore/
CPU4 2,453,061 cpu/event=cpu-cycles,percore/
CPU5 1,823,921 cpu/event=cpu-cycles,percore/
CPU6 1,383,166 cpu/event=cpu-cycles,percore/
CPU7 1,102,652 cpu/event=cpu-cycles,percore/

I don't understand how is this different from -A output:

# ./perf stat -e cpu/event=cpu-cycles/ -A
^C
Performance counter stats for 'system wide':

CPU0 56,847,497 cpu/event=cpu-cycles/
CPU1 75,274,384 cpu/event=cpu-cycles/
CPU2 63,866,342 cpu/event=cpu-cycles/
CPU3 89,559,693 cpu/event=cpu-cycles/
CPU4 74,761,132 cpu/event=cpu-cycles/
CPU5 76,320,191 cpu/event=cpu-cycles/
CPU6 55,100,175 cpu/event=cpu-cycles/
CPU7 48,472,895 cpu/event=cpu-cycles/

1.074800857 seconds time elapsed


The results are different.

With --percore-show-thread, CPU0 and CPU4 have the same counts (CPU0 and CPU4 are siblings, e.g. 2,453,061 in my example). The value is sum of CPU0 + CPU4.

Without --percore-show-thread, CPU0 and CPU4 have their own counts.

also the interval output is mangled:

# ./perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
# time CPU counts unit events
1.000177375 1.000177375 CPU0 138,483,540 cpu/event=cpu-cycles,percore/
1.000177375 1.000177375 CPU1 143,159,477 cpu/event=cpu-cycles,percore/
1.000177375 1.000177375 CPU2 177,554,642 cpu/event=cpu-cycles,percore/
1.000177375 1.000177375 CPU3 150,974,512 cpu/event=cpu-cycles,percore/
1.000177375 1.000177375 CPU4 138,483,540 cpu/event=cpu-cycles,percore/
1.000177375 1.000177375 CPU5 143,159,477 cpu/event=cpu-cycles,percore/
1.000177375 1.000177375 CPU6 177,554,642 cpu/event=cpu-cycles,percore/

jirka


Sorry, why the interval output is mangled? It's expected that CPU0 and CPU4 have the same counts.

Thanks
Jin Yao