Re: [PATCH 17/19] perf tools: Always force PERF_RECORD_FINISHED_ROUND event

From: Jiri Olsa
Date: Fri Jul 25 2014 - 07:34:53 EST


On Thu, Jul 24, 2014 at 06:34:51PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Sun, Jul 20, 2014 at 11:56:01PM +0200, Jiri Olsa escreveu:
> > The PERF_RECORD_FINISHED_ROUND governs queue flushing in
> > reporting, so it needs to be stored for any kind of event.
> >
> > Forcing the PERF_RECORD_FINISHED_ROUND event to be stored any
> > time we finish the round and wrote at least one event.
>
> This is not just one change, it does two things and it is not clearly
> detailed here.
>
> The existing code would write a PERF_RECORD_FINISHED_ROUND per call to
> record__mmap_read() for tracepoints were present.
>
> I.e. if tracepoints and any other type of event type was present, then
> this syntetic PERF_RECORD_FINISHED_ROUND thing is inserted which will be
> handled at report time.
>
> Now you changed it to not check if tracepoints are present, i.e. it will
> always insert that after processing N events in
> PERF_RECORD_FINISHED_ROUND. (change #1)
>
> The thing is, before it didn't checked if N was zero, needlessly
> inserting PERF_RECORD_FINISHED_ROUND events in some corner cases.
>
> Now you, in addition to change #1 you also check that the number of
> bytes written before that event processing loop in record__mmap_read()
> is the same after the loop, i.e. if some events were written to the
> output perf.data file, only inserting the PERF_RECORD_FINISHED_ROUND
> events if so. (change #2)
>
> I think both changes are OK, but should be split in different patches,

right, I'll split it

> and while testing comparing having the patch applied versus patch
> applied + ignore the PERF_RECORD_FINISHED_ROUND events in the resulting
> perf.data file, I get:
>
> [root@zoo /]# perf stat -r 5 perf report --no-ordered-samples > /dev/null
> <SNIP symbol stuff>
> Performance counter stats for 'perf report --no-ordered-samples' (5 runs):
>
> 30263.033852 task-clock (msec) # 1.000 CPUs utilized ( +- 0.48% )
> 346 context-switches # 0.011 K/sec ( +- 54.65% )
> 182 cpu-migrations # 0.006 K/sec ( +- 29.94% )
> 89,951 page-faults # 0.003 M/sec ( +- 0.05% )
> 92,342,939,311 cycles # 3.051 GHz ( +- 0.46% )
> 50,605,250,178 stalled-cycles-frontend # 54.80% frontend cycles idle ( +- 0.88% )
> <not supported> stalled-cycles-backend
> 101,171,572,553 instructions # 1.10 insns per cycle
> # 0.50 stalled cycles per insn ( +- 0.01% )
> 22,643,469,804 branches # 748.222 M/sec ( +- 0.01% )
> 284,395,273 branch-misses # 1.26% of all branches ( +- 0.45% )
>
> 30.249514999 seconds time elapsed ( +- 0.48% )
>
> [root@zoo /]# perf stat -r 5 perf report --ordered-samples > /dev/null
> <SNIP symbol stuff>
> Performance counter stats for 'perf report --ordered-samples' (5 runs):
>
> 32665.828429 task-clock (msec) # 1.001 CPUs utilized ( +- 0.41% )
> 304 context-switches # 0.009 K/sec ( +- 23.32% )
> 102 cpu-migrations # 0.003 K/sec ( +- 21.70% )
> 79,405 page-faults # 0.002 M/sec ( +- 0.02% )
> 101,761,091,322 cycles # 3.115 GHz ( +- 0.40% )
> 57,627,138,326 stalled-cycles-frontend # 56.63% frontend cycles idle ( +- 0.65% )
> <not supported> stalled-cycles-backend
> 105,982,144,263 instructions # 1.04 insns per cycle
> # 0.54 stalled cycles per insn ( +- 0.01% )
> 23,493,670,235 branches # 719.212 M/sec ( +- 0.01% )
> 319,060,575 branch-misses # 1.36% of all branches ( +- 0.19% )
>
> 32.636483981 seconds time elapsed ( +- 0.41% )
>
> [root@zoo /]#

so those 2 extra seconds is the ordering time, right? sounds ok

jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/