Re: [BUG] perf report: ordered events and flushing bug

From: Adrian Hunter
Date: Thu Mar 12 2015 - 16:27:47 EST


On 12/03/2015 9:13 p.m., David Ahern wrote:
On 3/12/15 1:05 PM, Stephane Eranian wrote:
On Thu, Mar 12, 2015 at 5:02 AM, Adrian Hunter <adrian.hunter@xxxxxxxxx> wrote:
On 12/03/15 05:32, Stephane Eranian wrote:
Hi,

I am working on the JIT support to improve the flow and have
perf record inject the MMAPs at the end of the collection. For
that I piggyback on the buildid pass. To avoid rewriting the entire perf.data
file, I simply append the the MMAP records at the end of the file. And that
puts them out-of-order in time systematically. But I thought it would be okay
because perf report would sort them by timestamps again.

Well, it does not in all cases! Why?

Because of the round flushing. Based on how far out-of-order an event is,
it may not be processed correctly because of round flushing. I believe this
may only impact MMAP records. This is a serious issue because
mmaps drive symbolization of samples. If samples are processed without the
proper dso mapping, then samples may not be symbolized or may be wrongly
symbolized.

So far, the workaround I found was to set the oe->next_flush = 0 for
the ROUND mode.
In other words, do not flush anything until FINAL. To me, this is the
only sensible
way of avoiding this kind of problems. I am not sure I understand the
point of flushing
anyway, except to minimize memory footprint, maybe. But it does not
work with vastly
out-of-order mmaps.

Rounds and flushing after them helps with the user experience -- at least for some commands. On systems with 1024 cpus perf data files get large quickly and the resulting analysis command can appear to hang for long periods (e.g., i have done 1 second data collections that generate 5GB files and take 30-45 minutes before perf shows any output).

I have a patch to perf-record to allow a user to specify the poll timeout. That helps define the maximum time between 'rounds' on the record side and then commands that show each event (e.g, perf script) will show output much more quickly without the need to consume large amounts of memory or process the entire file before displaying events.


Do you have a better solution?

You could hook the ordered event delivery (see Arnaldo's perf/core branch):

if (tool->ordered_events) {
inject->deliver = session->ordered_events.deliver;
session->ordered_events.deliver = jit_mmap_deliver;
}

int jit_mmap_deliver(struct ordered_events *oe, struct ordered_event *event, struct perf_sample *sample)
{
struct perf_inject *inject = container_of(oe->tool, struct perf_inject, tool);

if (next_jit_mmap_time < sample->time) {
<synth mmap event>
perf_session__deliver_synth_event(...);
}
return inject->deliver(oe, event, sample);
}

I don't quite understand what this does?
Are you saying that in perf/core there is a new callback mechanism to
override the deliver phase
when you need ordered event and that callback could be used to avoid
the flushing?


I think the suggestion is to push the mmap events into the stream at the appropriate time slot.

Yes that was all. Sorry if it was confusing.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/