Re: [PATCH 3/6] perf: add reference time event

From: David Ahern
Date: Mon Aug 15 2011 - 00:06:15 EST


Ingo:

On 08/08/2011 01:30 PM, Arnaldo Carvalho de Melo wrote:
>>>> The answer to the 'why' is that putting a reference timestamp in the
>>>> header field does not work for file appends across reboots. ie., the case:
>>>> perf record --tod ...
>>>> reboot
>>>> perf record -A --tod ...
>>>
>>> Damn append mode. I doubt that thing is really used. And it just complexifies
>>> everything. It might be wise to get rid of it?
>>>
>>> Ingo, Peter, Arnaldo?
>>>
>>>> perf_clock timestamps change across reboots so the reference time
>>>> created by the first invocation is not valid for the append case. The
>>>> discussion then drifted towards having a kernel side event which per
>>>> past patch sets has its own issues.
>>>>
>>>> So to summarize the options proposed to date and issues with the proposals:
>>>> 1. reference timestamp in header
>>>> - does not work for appends across reboots
>>>>
>>>> 2. synthesized events
>>>> - preference against them
>>>>
>>>> 3. kernel side event
>>>> - cannot generate an initial sample (with counter value and
>>>> perf_clock timestamp) on demand - e.g., start of session; a proposal to
>>>> use an ioctl to add one to the event stream was shot down
>>>>
>>>> At this point the only idea that comes to mind is to use a combination
>>>> of 2 and 3: add the kernel side clock event
>>>> (https://lkml.org/lkml/2011/2/18/11), read the realtime clock counter,
>>>> read the monotonic clock timestamp (ie., perf_clock value), and
>>>> synthesize a perf sample that is written to the file. The append case
>>>> (with mismatch in --tod options between record invocations) would be
>>>> handled by having the kernel side clock event in the event list
>>>> (perf_evlist__equal would fail if --tod was not used for all invocations).
>>>
>>> Actually you first have to face a deeper problem. events are not stored
>>> in order in the flow, but they are sorted from perf_session__process_events().
>>>
>>> The bunch of sorted events is flushed periodically and sent to the consumer.
>>>
>>> See flush_sample_queue().
>>>
>>> And this sorting is made on top of the sample->time timestamps. So events
>>> are first sorted on sample->time and only afterward you have access to your
>>> gtod tracepoint samples. But if that gtod sample has been taken after a reboot
>>> then its sample->time is not consistant with the rest. It is not well sorted
>>> and thus the reftime won't be updated at the right moment.
>>>
>>> So the problem is that reftime update already depends on a consistant cpu
>>> timestamp.
>>>
>>> I can't think about a sane way to work around that. Sorting on gtod + cpu timestamp
>>> is not a solution because gtod can change.
>>>
>>> I'd rather propose to refuse append mode as long as we have any timestamp. That includes
>>> gtod but also sample timestamps. They are buggy if we reboot.
>>
>> Arnaldo's sending patches, so I take it he's dug out from backlog. ;-)
>>
>> Any objections to not allowing append mode for perf-record if samples
>> contain timestamps?
>
> I never used append mode, but having these restrictions on append mode
> seems to be counter intuitive, either we make timestamps work with
> append mode or we remove append mode completely.
>
> Ingo?
>
> - Arnaldo

Any opinion on prohibiting append mode if samples contain timestamps? To
summarize perf_clock is reset on reboots which affects sample ordering
for the append case. We can either remove the append option or not allow
it if samples have timestamps.

Thanks,
David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/