Re: [RFC][PATCH] ring-buffer: Have nested events still record running time stamp
From: Steven Rostedt
Date: Thu Jun 25 2020 - 15:58:56 EST
On Thu, 25 Jun 2020 15:35:02 -0400 (EDT)
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:
> >
> > Well, write_stamp is updated via local64, which I believe handles this
> > for us. I probably should make before_stamp handle it as well.
>
> By looking at local64 headers, it appears that 32-bit rely on atomic64,
> which on x86 is implemented with LOCK; cmpxchg8b for 586+ (which is AFAIK
> painfully slow) and with cli/sti for 386/486 (which is not nmi-safe).
>
> For all other 32-bit architectures, the generic atomic64.h implements 64-bit
> atomics using spinlocks with irqs off, which seems to also bring considerable
> overhead, in addition to be non-reentrant with respect to NMI-like interrupts,
> e.g. FIQ on ARM32.
>
> That seems at odds with the performance constraints of ftrace's ring buffer.
>
> Those performance and reentrancy concerns are why I always stick to local_t
> (long), and never use a full 64-bit type for anything that has to
> do with concurrent store/load between execution contexts in lttng.
If this is an issue, I'm sure I can make my own wrappers for
"time_local()", and implement something that you probably do. Because,
we only need to worry about wrapping the 32 bit lower number, as that
only happens every 4 seconds. But that is an implementation detail, it
doesn't affect the overall design correctness.
But it is something I should definitely look in to.
>
> >
> >
> >>
> >> > * a full time stamp (this can turn into a time extend which
> >> > is
> >> > * just an extended time delta but fill up the extra space).
> >> > */
> >> > if (after != before)
> >> > abs = true;
> >> >
> >> > ts = clock();
> >> >
> >> > /* Now update the before_stamp (everyone does this!) */
> >> > [B] WRITE_ONCE(before_stamp, ts);
> >> >
> >> > /* Read the current next_write and update it to what we want
> >> > write
> >> > * to be after we reserve space. */
> >> > next = READ_ONCE(next_write);
> >> > WRITE_ONCE(next_write, w + len);
> >> >
> >> > /* Now reserve space on the buffer */
> >> > [C] write = local_add_return(len, write_tail);
> >>
> >> So the reservation is not "just" an add instruction, it's actually an
> >> xadd on x86. Is that really faster than a cmpxchg ?
> >
> > I believe the answer is still yes. But I can run some benchmarks to
> > make sure.
>
> This would be interesting to see, because if xadd and cmpxchg have
> similar overhead, then going for a cmpxchg-loop for the space
> reservation could vastly decrease the overall complexity of this
> timestamp+space reservation algorithm.
It would most likely cause userspace breakage, and that would be a show
stopper.
But still good to see.
Thanks for the review.
-- Steve