Re: [PATCH] Perf: Correct Assumptions about Sample Timestamps inPasses

From: David Ahern
Date: Fri Jan 03 2014 - 17:45:45 EST

On 1/3/14, 3:07 PM, Frederic Weisbecker wrote:
On Wed, Jan 01, 2014 at 11:37:55AM -0700, David Ahern wrote:
On 12/26/13, 8:30 AM, Frederic Weisbecker wrote:
On Thu, Dec 26, 2013 at 10:24:03AM -0500, David Ahern wrote:
On 12/26/13, 10:14 AM, Frederic Weisbecker wrote:
I was carrying that patch while working on perf-kvm-stat-live last
Fall. It does not solve the problem for live commands, so ended up
dropping it and going with local (to the command) hacks. I still
think for live commands getting a perf_clock timestamp at the start
of a round and using that as the flush time will work best.

For perf-kvm-stat-live using perf_clock value at the start of the
round as the flush time works beautifully:

Never once failed with "Warning: Timestamp below last timeslice
flush" error.

I'm not sure I understand why we need that. Why doesn't it work by simply flushing
events prior to the earliest timestamp among every CPUs last event?

Here's one scenario. Consider N-mmaps:

|----- t_flush
0 -----|---x------------------------
1 -----|----|------------------------
... |
N -----|-------ssss-|-----------------

t_start t_1 ... t_N

You start a round at some time -- t_start. By starting a round it means you go to mmap 0 and check for events, then mmap 1, ..., mmap N. It takes a finite amount of time to move from one mmap to another.

Assume there are no events on mmap 0, 1, ... N-1 but samples are generated in mmap N. In the time it takes to move forward from 0 to N, a sample can be generated for mmap 0 and written to the buffer - the 'x' above. It now contains a timestamp < than samples on any other mmap and out pops the flush error.

perf-kvm can have over 650,000 events per second and those tend to come in bunches on a single mmap. So even if you go for a "max of the min times across mmaps" it is often wrong.

The non-perf_clock logic in perf-kvm uses the min time across all mmaps and even it occasionally fails with the flush error.


I can see one remaining issue when an event interrupts another in a CPU. If the
interrupt happens after perf_prepare_sample() -> perf_clock() and perf_output_begin(),
we may have locally non-monotonic timestamps in a CPU buffer.

That could be solved with a heuristic similar to yours: flush events prior a few millisecs
before the barrier since interrupt are supposed to be short. Or we could move the perf_clock()
event snapshot to perf_output_sample() to make sure that the event space is reserved before
we get the timestamp, thus the interrupting events having superior timestamps are guaranteed
to be past the interrupted event in the stream.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at