Re: [PATCH 5/7] perf tools: Optimize sample parsing for ordered events
From: Ingo Molnar
Date: Tue Oct 31 2017 - 05:41:02 EST
* Jiri Olsa <jolsa@xxxxxxxxxx> wrote:
> Currently when using ordered events we parse the sample
> twice (the perf_evlist__parse_sample function). Once
> before we queue the sample for sorting:
>
> perf_session__process_event
> perf_evlist__parse_sample(sample)
> perf_session__queue_event(sample.time)
>
> And then when we deliver the sorted sample:
>
> ordered_events__deliver_event
> perf_evlist__parse_sample
> perf_session__deliver_event
>
> We can skip the initial full sample parsing by using
> perf_evlist__parse_sample_timestamp function, which
> got introduced earlier. The new path looks like:
>
> perf_session__process_event
> perf_evlist__parse_sample_timestamp
> perf_session__queue_event
>
> ordered_events__deliver_event
> perf_session__deliver_event
> perf_evlist__parse_sample
>
> It saves some instructions and is slightly faster:
>
> Before:
> Performance counter stats for './perf.old report --stdio' (5 runs):
>
> 64,396,007,225 cycles:u ( +- 0.97% )
> 105,882,112,735 instructions:u # 1.64 insn per cycle ( +- 0.00% )
>
> 21.618103465 seconds time elapsed ( +- 1.12% )
>
> After:
> Performance counter stats for './perf report --stdio' (5 runs):
>
> 60,567,807,182 cycles:u ( +- 0.40% )
> 104,853,333,514 instructions:u # 1.73 insn per cycle ( +- 0.00% )
>
> 20.168895243 seconds time elapsed ( +- 0.32% )
That's a 7% speedup, not bad!
Thanks,
Ingo