Re: [PATCH] Perf: Correct Assumptions about Sample Timestamps inPasses

From: Frederic Weisbecker
Date: Sat Jan 04 2014 - 10:06:26 EST

On Fri, Jan 03, 2014 at 03:45:36PM -0700, David Ahern wrote:
> On 1/3/14, 3:07 PM, Frederic Weisbecker wrote:
> >I'm not sure I understand why we need that. Why doesn't it work by simply flushing
> >events prior to the earliest timestamp among every CPUs last event?
> Here's one scenario. Consider N-mmaps:
> |----- t_flush
> v
> 0 -----|---x------------------------
> 1 -----|----|------------------------
> ... |
> N -----|-------ssss-|-----------------
> t_start t_1 ... t_N
> You start a round at some time -- t_start. By starting a round it
> means you go to mmap 0 and check for events, then mmap 1, ..., mmap
> N. It takes a finite amount of time to move from one mmap to
> another.
> Assume there are no events on mmap 0, 1, ... N-1 but samples are
> generated in mmap N. In the time it takes to move forward from 0 to
> N, a sample can be generated for mmap 0 and written to the buffer -
> the 'x' above. It now contains a timestamp < than samples on any
> other mmap and out pops the flush error.

Lets reformulate as following. I'm copy-pasting the example in session.c
but adapting it to your scenario.

* ============ PASS n =================
* CPU 0 | CPU 1
* |
* - | 2
* - | 3
* - | 4 <--- max recorded
* ============ PASS n + 1 ==============
* CPU 0 | CPU 1
* |
* 1 | 5
* - | 6
* - | 7 <---- max recorded
* Flush every events below timestamp 4

So in the first round, CPU 0 has no event by the time we read it. Then while
we read the events from CPU 1 (val 2,3,4), a new one comes in concurrently
in CPU 0 (val 1, which matches 's' in your example ). We missed it due to the
linear mmap read on all buffers so we'll get it on the second round.

We find it out in the second round, CPU 1 has also new events. At this time we know that
if CPU 0 had events up to timestamp 4, we should have seen all of them because
we read CPU 0 buffer in PASS n + 1 after we read CPU 1 buffer on PASS n.

Of course that's what happens in a perfect world with the assumption that ordering
is all correct, that events write/commit doesn't take too much time to complete,
that perf_clock() is globally monotonic (and it's not IIUC). But a little heuristical
correction on the timestamp barrier should prevent from issues with that.

So this is how the code behaves currently and it should handle a case like above.
Now there is still the problem of:

1) local timestamps not moving forward (could it happen when events happen in storm,
when they overflow multiple times in once for example, and clock is not granular
Anyway this should be solved with the patch that takes the earliest last event on all
CPU buffer instead of the maximum of a round as a barrier.

2) local timestamps not monotonic due to interrupting events. This could be fixed
in the kernel with moving perf_clock() snapshot in perf_output_sample().
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at