Re: [RFC][PATCH] ring-buffer: Have nested events still record running time stamp

From: Mathieu Desnoyers
Date: Thu Jun 25 2020 - 15:35:06 EST


----- On Jun 25, 2020, at 2:35 PM, rostedt rostedt@xxxxxxxxxxx wrote:

> On Thu, 25 Jun 2020 13:55:07 -0400 (EDT)
> Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote
>
>> > Here's the design of this solution:
>> >
>> > All this is per cpu, and only needs to worry about nested events (not
>> > parallel events).
>> >
>> > The players:
>> >
>> > write_tail: The index in the buffer where new events can be written to.
>> > It is incremented via local_add() to reserve space for a new event.
>> >
>> > before_stamp: A time stamp set by all events before reserving space.
>> >
>> > write_stamp: A time stamp updated by events after it has successfully
>> > reserved space.
>>
>> What are the types used for before_stamp and write_stamp ? If those
>> are 64-bit integers, how does sharing them between nested contexts
>> work on 32-bit architectures ?
>
> Well, write_stamp is updated via local64, which I believe handles this
> for us. I probably should make before_stamp handle it as well.

By looking at local64 headers, it appears that 32-bit rely on atomic64,
which on x86 is implemented with LOCK; cmpxchg8b for 586+ (which is AFAIK
painfully slow) and with cli/sti for 386/486 (which is not nmi-safe).

For all other 32-bit architectures, the generic atomic64.h implements 64-bit
atomics using spinlocks with irqs off, which seems to also bring considerable
overhead, in addition to be non-reentrant with respect to NMI-like interrupts,
e.g. FIQ on ARM32.

That seems at odds with the performance constraints of ftrace's ring buffer.

Those performance and reentrancy concerns are why I always stick to local_t
(long), and never use a full 64-bit type for anything that has to
do with concurrent store/load between execution contexts in lttng.

>
>
>>
>> > * a full time stamp (this can turn into a time extend which
>> > is
>> > * just an extended time delta but fill up the extra space).
>> > */
>> > if (after != before)
>> > abs = true;
>> >
>> > ts = clock();
>> >
>> > /* Now update the before_stamp (everyone does this!) */
>> > [B] WRITE_ONCE(before_stamp, ts);
>> >
>> > /* Read the current next_write and update it to what we want
>> > write
>> > * to be after we reserve space. */
>> > next = READ_ONCE(next_write);
>> > WRITE_ONCE(next_write, w + len);
>> >
>> > /* Now reserve space on the buffer */
>> > [C] write = local_add_return(len, write_tail);
>>
>> So the reservation is not "just" an add instruction, it's actually an
>> xadd on x86. Is that really faster than a cmpxchg ?
>
> I believe the answer is still yes. But I can run some benchmarks to
> make sure.

This would be interesting to see, because if xadd and cmpxchg have
similar overhead, then going for a cmpxchg-loop for the space
reservation could vastly decrease the overall complexity of this
timestamp+space reservation algorithm.

If we decrease complexity of the fast-paths, we can then reduce the
amount of test+branches which need to be performed and instruction cache
being used, and this can lead to performance improvements. So it's not only
good in making things more easily verifiable, but it also improves
performance.

[...]
>
>
>>
>> > } else {
>> > /* slow path */
>> > if (w == next) {
>>
>> If it's a slow path, why try to play games with a delta timestamp ?
>> Space should not be an issue for a slow path like this. It would be
>> simpler to just use the full timestamp here.
>
> Hmm, you may be right. Previous iterations of this code had a distinct
> difference here, but after restructuring it, I don't think that
> distinction is valid anymore. If anything, having this at least lets me
> validate that it's doing what I believe it should be doing (as I added
> a ton of trace_printk()s into this).

Good. Simple is good. :-)

>
>>
>> > /* This event preempted the previous
>> > event
>> > * just after it reserved its
>> > buffer.
>>
>> You mean nesting after [C] but before [D].
>
> Yes. I can add that for clarity, but perhaps I don't need that if I
> merge the two.

OK

>
>>
>> > * It's timestamp should be
>> > "before_stamp" */
>>
>> It's -> Its
>
> ;-)
>
> My email client messed up your formatting of the rest of the email, so
> I'll send a separate reply.

OK,

Thanks!

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com