Re: [PATCH 3/6] perf: add reference time event

From: David Ahern
Date: Mon Aug 15 2011 - 00:06:15 EST


On 08/08/2011 01:30 PM, Arnaldo Carvalho de Melo wrote:
>>>> The answer to the 'why' is that putting a reference timestamp in the
>>>> header field does not work for file appends across reboots. ie., the case:
>>>> perf record --tod ...
>>>> reboot
>>>> perf record -A --tod ...
>>> Damn append mode. I doubt that thing is really used. And it just complexifies
>>> everything. It might be wise to get rid of it?
>>> Ingo, Peter, Arnaldo?
>>>> perf_clock timestamps change across reboots so the reference time
>>>> created by the first invocation is not valid for the append case. The
>>>> discussion then drifted towards having a kernel side event which per
>>>> past patch sets has its own issues.
>>>> So to summarize the options proposed to date and issues with the proposals:
>>>> 1. reference timestamp in header
>>>> - does not work for appends across reboots
>>>> 2. synthesized events
>>>> - preference against them
>>>> 3. kernel side event
>>>> - cannot generate an initial sample (with counter value and
>>>> perf_clock timestamp) on demand - e.g., start of session; a proposal to
>>>> use an ioctl to add one to the event stream was shot down
>>>> At this point the only idea that comes to mind is to use a combination
>>>> of 2 and 3: add the kernel side clock event
>>>> (, read the realtime clock counter,
>>>> read the monotonic clock timestamp (ie., perf_clock value), and
>>>> synthesize a perf sample that is written to the file. The append case
>>>> (with mismatch in --tod options between record invocations) would be
>>>> handled by having the kernel side clock event in the event list
>>>> (perf_evlist__equal would fail if --tod was not used for all invocations).
>>> Actually you first have to face a deeper problem. events are not stored
>>> in order in the flow, but they are sorted from perf_session__process_events().
>>> The bunch of sorted events is flushed periodically and sent to the consumer.
>>> See flush_sample_queue().
>>> And this sorting is made on top of the sample->time timestamps. So events
>>> are first sorted on sample->time and only afterward you have access to your
>>> gtod tracepoint samples. But if that gtod sample has been taken after a reboot
>>> then its sample->time is not consistant with the rest. It is not well sorted
>>> and thus the reftime won't be updated at the right moment.
>>> So the problem is that reftime update already depends on a consistant cpu
>>> timestamp.
>>> I can't think about a sane way to work around that. Sorting on gtod + cpu timestamp
>>> is not a solution because gtod can change.
>>> I'd rather propose to refuse append mode as long as we have any timestamp. That includes
>>> gtod but also sample timestamps. They are buggy if we reboot.
>> Arnaldo's sending patches, so I take it he's dug out from backlog. ;-)
>> Any objections to not allowing append mode for perf-record if samples
>> contain timestamps?
> I never used append mode, but having these restrictions on append mode
> seems to be counter intuitive, either we make timestamps work with
> append mode or we remove append mode completely.
> Ingo?
> - Arnaldo

Any opinion on prohibiting append mode if samples contain timestamps? To
summarize perf_clock is reset on reboots which affects sample ordering
for the append case. We can either remove the append option or not allow
it if samples have timestamps.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at