[PATCH] perf inject: Flush ordered events on FINISHED_ROUND

From: Namhyung Kim
Date: Fri Oct 02 2020 - 09:03:29 EST


Currently perf inject just repipes the event without any flush. It
makes an issue that it changes the order of events processed.

Normally we want to process events in time order, but events are
recorded from several cpus and they weren't sorted with each other.
So we put them into the ordered event queue, sorted by time, and flush
them when we see a next FINISHED_ROUND event.

But this is for events from kernel, user space events (like the
FINISHED_ROUND) are processed without queueing. So during the perf
inject, it writes all the FINISHED_ROUND events directly while
SAMPLE (and other) events are kept in the queue. This makes the user
space events are put before any kernel events.

You can see that with the following command:

$ perf record -a -o- sleep 1 | perf inject -b | perf script -i- --show-round-events
PERF_RECORD_FINISHED_ROUND
PERF_RECORD_FINISHED_ROUND
PERF_RECORD_FINISHED_ROUND
...

Omitting perf inject, you can see the events are placed in the middle
of the data.

You might argue that the whole point of the FINISHED_ROUND event is to
sort (kernel) events. And after perf inject, all events are written
in a proper time order so we don't care about the FINISHED_ROUND event
anymore.

But the problem is we don't know whether the input data is sorted or
not (maybe we can add a feature bit for this?) so it should use an
ordered event queue when processing the input like in perf report.

Remember all the FINISHED_ROUND events now come before other events so
the tool cannot know when it can flush the data so everything will be
queued until it meets the end of the input. Actually it's same for
perf inject itself as it doesn't flush the queue.

Below measures time and memory usage during the perf inject and
report using ~190MB data file.

Before:
perf inject: 11.09 s, 382148 KB
perf report: 8.05 s, 397440 KB

After:
perf inject: 16.24 s, 83376 KB
perf report: 7.96 s, 216184 KB

As you can see, it used 2x memory of the input size. I guess it's
because it needs to keep the copy for the whole input. But I don't
understand why processing time of perf inject increased..

I'm not sure how it affects the auxtrace, but it should be fine IMHO.

Cc: Al Grant <al.grant@xxxxxxxxxxxx>
Cc: Adrian Hunter <adrian.hunter@xxxxxxxxx>
Signed-off-by: Namhyung Kim <namhyung@xxxxxxxxxx>
---
tools/perf/builtin-inject.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 6d2f410d773a..9fa78a9edfc5 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -79,6 +79,7 @@ static int perf_event__repipe_oe_synth(struct perf_tool *tool,
union perf_event *event,
struct ordered_events *oe __maybe_unused)
{
+ ordered_events__flush(oe, OE_FLUSH__ROUND);
return perf_event__repipe_synth(tool, event);
}

--
2.28.0.806.g8561365e88-goog