Re: [PATCH 1/2] perf tools: Add option to copy events when queueing

From: Ingo Molnar
Date: Fri Oct 03 2014 - 04:48:09 EST



* Jiri Olsa <jolsa@xxxxxxxxxx> wrote:

> On Fri, Oct 03, 2014 at 06:34:21AM +0200, Ingo Molnar wrote:
> >
> > * Alexander Yarygin <yarygin@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > > When processing events the session code has an ordered samples
> > > queue which is used to time-sort events coming in across
> > > multiple mmaps. At a later point in time samples on the queue
> > > are flushed up to some timestamp at which point the event is
> > > actually processed.
> > >
> > > When analyzing events live (ie., record/analysis path in the
> > > same command) there is a race that leads to corrupted events
> > > and parse errors which cause perf to terminate. The problem is
> > > that when the event is placed in the ordered samples queue it
> > > is only a reference to the event which is really sitting in the
> > > mmap buffer. Even though the event is queued for later
> > > processing the mmap tail pointer is updated which indicates to
> > > the kernel that the event has been processed. The race is
> > > flushing the event from the queue before it gets overwritten by
> > > some other event. For commands trying to process events live
> > > (versus just writing to a file) and processing a high rate of
> > > events this leads to parse failures and perf terminates.
> > >
> > > Examples hitting this problem are 'perf kvm stat live',
> > > especially with nested VMs which generate 100,000+ traces per
> > > second, and a command processing scheduling events with a high
> > > rate of context switching -- e.g., running 'perf bench sched
> > > pipe'.
> > >
> > > This patch offers live commands an option to copy the event
> > > when it is placed in the ordered samples queue.
> >
> > What's the performance effect of this - i.e. by how much does CPU
> > use increase due to copying the events?
> >
> > Wouldn't it be faster to fix this problem by updating the mmap
> > tail pointer only once the event has truly been consumed?
>
> Alexander mentioned he'd loose data, because of userspace
> processing being to slow:
>
> http://marc.info/?l=linux-kernel&m=141111652424818&w=2

So copying helps by allocating an essentially larger buffer, to
hold all unprocessed events that user-space is too slow to
process?

I guess it's a valid usecase.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/