Re: [PATCH] perf stat: Show percore counts in per CPU output

From: Jiri Olsa
Date: Mon Feb 10 2020 - 09:01:39 EST


On Mon, Feb 10, 2020 at 09:46:46PM +0800, Jin, Yao wrote:
>
>
> On 2/10/2020 9:28 PM, Jiri Olsa wrote:
> > On Thu, Feb 06, 2020 at 09:56:13AM +0800, Jin Yao wrote:
> > > We have supported the event modifier "percore" which sums up the
> > > event counts for all hardware threads in a core and show the counts
> > > per core.
> > >
> > > For example,
> > >
> > > # perf stat -e cpu/event=cpu-cycles,percore/ -a -A -- sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > S0-D0-C0 395,072 cpu/event=cpu-cycles,percore/
> > > S0-D0-C1 851,248 cpu/event=cpu-cycles,percore/
> > > S0-D0-C2 954,226 cpu/event=cpu-cycles,percore/
> > > S0-D0-C3 1,233,659 cpu/event=cpu-cycles,percore/
> > >
> > > This patch provides a new option "--percore-show-thread". It is
> > > used with event modifier "percore" together to sum up the event counts
> > > for all hardware threads in a core but show the counts per hardware
> > > thread.
> > >
> > > For example,
> > >
> > > # perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -- sleep 1
> > >
> > > Performance counter stats for 'system wide':
> > >
> > > CPU0 2,453,061 cpu/event=cpu-cycles,percore/
> > > CPU1 1,823,921 cpu/event=cpu-cycles,percore/
> > > CPU2 1,383,166 cpu/event=cpu-cycles,percore/
> > > CPU3 1,102,652 cpu/event=cpu-cycles,percore/
> > > CPU4 2,453,061 cpu/event=cpu-cycles,percore/
> > > CPU5 1,823,921 cpu/event=cpu-cycles,percore/
> > > CPU6 1,383,166 cpu/event=cpu-cycles,percore/
> > > CPU7 1,102,652 cpu/event=cpu-cycles,percore/
> >
> > I don't understand how is this different from -A output:
> >
> > # ./perf stat -e cpu/event=cpu-cycles/ -A
> > ^C
> > Performance counter stats for 'system wide':
> >
> > CPU0 56,847,497 cpu/event=cpu-cycles/
> > CPU1 75,274,384 cpu/event=cpu-cycles/
> > CPU2 63,866,342 cpu/event=cpu-cycles/
> > CPU3 89,559,693 cpu/event=cpu-cycles/
> > CPU4 74,761,132 cpu/event=cpu-cycles/
> > CPU5 76,320,191 cpu/event=cpu-cycles/
> > CPU6 55,100,175 cpu/event=cpu-cycles/
> > CPU7 48,472,895 cpu/event=cpu-cycles/
> >
> > 1.074800857 seconds time elapsed
> >
>
> The results are different.
>
> With --percore-show-thread, CPU0 and CPU4 have the same counts (CPU0 and
> CPU4 are siblings, e.g. 2,453,061 in my example). The value is sum of CPU0 +
> CPU4.

so it shows percore stats but displays all the cpus? what is this good for?
to see which cpus are in core? if that's the case then I think we could
somehow display the cpu numbers for core in --per-core output, like:

S0-D0-C0(0,4) 395,072 cpu/event=cpu-cycles,percore/
S0-D0-C1(1,5) 851,248 cpu/event=cpu-cycles,percore/
S0-D0-C2(2,6) 954,226 cpu/event=cpu-cycles,percore/
S0-D0-C3(3,7) 1,233,659 cpu/event=cpu-cycles,percore/


>
> Without --percore-show-thread, CPU0 and CPU4 have their own counts.
>
> > also the interval output is mangled:
> >
> > # ./perf stat -e cpu/event=cpu-cycles,percore/ -a -A --percore-show-thread -I 1000
> > # time CPU counts unit events
> > 1.000177375 1.000177375 CPU0 138,483,540 cpu/event=cpu-cycles,percore/
> > 1.000177375 1.000177375 CPU1 143,159,477 cpu/event=cpu-cycles,percore/
> > 1.000177375 1.000177375 CPU2 177,554,642 cpu/event=cpu-cycles,percore/
> > 1.000177375 1.000177375 CPU3 150,974,512 cpu/event=cpu-cycles,percore/
> > 1.000177375 1.000177375 CPU4 138,483,540 cpu/event=cpu-cycles,percore/
> > 1.000177375 1.000177375 CPU5 143,159,477 cpu/event=cpu-cycles,percore/
> > 1.000177375 1.000177375 CPU6 177,554,642 cpu/event=cpu-cycles,percore/
> >
> > jirka
> >
>
> Sorry, why the interval output is mangled? It's expected that CPU0 and CPU4
> have the same counts.

there are 2 timestamp columns and the header line does
not align with the data

jirka