On Fri, 19 Jan 2024 20:49:36 -0500
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
Let's say we have the following ktime_get() values (monotonic timestamp value) for
a sequence of events:
Timestamp (Hex) Encoding in the trace
Packet header timestamp begin 0x12345678 64-bit: 0x12345678
Event 1 0x12345678 16-bit: 0x5678
(When decoded, same value as previous timestamp, no overflow)
Event 2 0x12347777 16-bit: 0x7777
(When decoded, going from "0x5678" to "0x7777" does not overflow 16-bit)
Event 3 0x12350000 16-bit: 0x0000
(When decoded, going from "0x7777" to "0x0000" overflow 16-bit exactly once
which allows the trace reader to reconstruct timestamp 0x12350000 from the
previous timestamp and the 16-bit timestamp encoding.)
Event 4 0x12370000 64-bit: 0x12370000
(Encoding over 16-bit not possible because going from 0x12350000 to
0x12370000 would overflow 16-bit twice, which cannot be detected
by a trace reader. Therefore use the full 64-bit timestamp in the
"large" event header representation.)
I think that's basically what I said, but you are just looking at it
differently ;-) Or should I say, you are using bits for optimization.
Based on your explanation below, we are really talking about different things
here. Let me try to reply to your explanation to try to show where what I am
doing completely differs from what you have in mind. This will help explain
how I handle 16-bit overflow as well.
The events are based off of the last injected timestamp.
Incorrect. There is no "injected timestamp". There is only a concept
of the "current timestamp" as we either write to or read from the
event stream. I will take the point of view of the trace reader for
the rest of the discussion.
The above example,
starts with an timestamp injection into the packet header: 0x12345678, with
the lsb 16bits ignore.
Wrong again. The 16 least significant bits are not ignored. The "current timestamp"
is really 0x12345678 when the packet header is read.
In the packet header you have 0x12345678 in the first event you have
0x5678 how does that get you the timestamp? If that event had 0x8888,
when the reader reads this packet, it would take the header 0x12345678
chop off (ignore) the 5678, and add the 8888, right?
But how do you detect the overflow? That last timestamps to know if
the tsc overflowed or not needs to be saved and compared. I would
assume you have a similar race that I have.
Yes, I save a "last timestamp" per buffer, but the race does not
matter because of the order in which it is saved wrt local cmpxchg
updating the reserved position.
The algorithm looks like:
do {
- read current reserved position (old pos)
- read time
- compute new reserved position (new pos)
} while (cmpxchg(reserved pos, old pos, new pos) != old pos);
[A]
save_last_tsc()
So the last_tsc that is saved is from the timestamp read before the
cmpxchg.
If interrupted at [A] by another trace producer, it will compare with
an older "last tsc" than the tsc of the event physically located just
before the nested event. This stale "last tsc" has a value which is
necessarily lower than the one we would be saving with the
save_last_tsc immediately afterwards, which means in the worse case
we end up using a full 64-bit timestamp when in fact we could use a
more compact representation. But this race is rare and therefore it
does not matter for size.
That's equivalent to me "injecting" an absolute value for the same race.
The fact that I only need this last_tsc value for the sake of
optimization, and not for computation of a time delta from a previous
injected timestamp, makes it possible to handle the race gracefully
without requiring anything more than a single last_tsc value per
buffer and a single comparison for 16-bit overflow.
If you have:
do {
- read current reserved position (old pos)
- read time
- compute new reserved position (new pos)
} while (cmpxchg(reserved pos, old pos, new pos) != old pos);
[A]
And here you have an interrupt that runs for several minutes (yeah bad
code ;-) And it does the same thing
do {
- read current reserved position (old pos)
- read time
- compute new reserved position (new pos)
} while (cmpxchg(reserved pos, old pos, new pos) != old pos);
save_last_tsc()
But that last_tsc from before the cmpxchg is much later than the one it
interrupted.
save_last_tsc()
Ah, so if you put the clock back here, it will just cause the next
event to inject a full timestamp again because the delta is too big.
This is how you are able to avoid the "before/after" logic I have, as
the race is automatically detected. The least significant bits of the
timestamp is ignored for the event delta calculation.
And if a race
happens where the interrupting event saves a later timestamp and comes
back here, if the interrupted event writes the older timestamp, it just
causes that delta calculation to overflow again and you inject another
64bit timestamp into the buffer.
Going through a transition of changing it could end up being just as
complex. I'm not sure the complexity in that transition is better than
the complexity of the current code, as this code has been there for 15
years, and I know of at least 2 other projects that depend on this
format as is.