Re: [PATCH] perf inject: Flush ordered events on FINISHED_ROUND

From: Jiri Olsa
Date: Thu Oct 08 2020 - 05:07:07 EST


On Tue, Oct 06, 2020 at 02:40:32PM +0900, namhyung@xxxxxxxxxx wrote:
> On Tue, Oct 06, 2020 at 11:39:49AM +0900, namhyung@xxxxxxxxxx wrote:
> > > > On Fri, Oct 02, 2020 at 10:03:17PM +0900, Namhyung Kim wrote:
> > > > > Below measures time and memory usage during the perf inject and
> > > > > report using ~190MB data file.
> > > > >
> > > > > Before:
> > > > > perf inject: 11.09 s, 382148 KB
> > > > > perf report: 8.05 s, 397440 KB
> > > > >
> > > > > After:
> > > > > perf inject: 16.24 s, 83376 KB
> > > > > perf report: 7.96 s, 216184 KB
> > > > >
> > > > > As you can see, it used 2x memory of the input size. I guess it's
> > > > > because it needs to keep the copy for the whole input. But I don't
> > > > > understand why processing time of perf inject increased..
> >
> > Measuring it with time shows:
> >
> > before after
> > real 11.309s 17.040s
> > user 8.084s 13.940s
> > sys 6.535s 6.732s
> >
> > So it's user space to make the difference. I've run perf record on
> > both (with cycles:U) and the dominant function is same: queue_event.
> > (46.98% vs 65.87%)
> >
> > It seems the flushing the queue makes more overhead on sorting.
>
> So I suspect the cache-miss ratio affects the performance. With
> flushing, data is processed in the middle and all the entries are
> reused after flush so it would invalidate all the cache lines
> occasionally.
>
> This is the perf stat result:
>
> * Before
>
> 7,167,414,019 L1-dcache-loads
> 337,471,761 L1-dcache-read-misses # 4.71% of all L1-dcache hits
>
> 11.011224671 seconds time elapsed
>
>
> * After
>
> 7,075,556,792 L1-dcache-loads
> 771,810,388 L1-dcache-read-misses # 10.91% of all L1-dcache hits
>
> 17.015901863 seconds time elapsed
>
>
> Hmm.. it's a memory & time trade-off then. Maybe we need a switch to
> select which one?

I'd keep the faster one ;-) so the one before

jirka