Re: Perf and ftrace [was Re: PyTimechart]

From: Mathieu Desnoyers
Date: Thu May 13 2010 - 13:10:27 EST


* Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> On Thu, 2010-05-13 at 12:31 -0400, Mathieu Desnoyers wrote:
> > * Steven Rostedt (rostedt@xxxxxxxxxxx) wrote:
> > > On Thu, 2010-05-13 at 09:20 -0400, Mathieu Desnoyers wrote:
> > [...]
> > > > > >
> > > > > > ...
> > > > > >
> > > > > > 97 /**
> > > > > > 98 * ring_buffer_clear_noref_flag - Clear the noref subbuffer flag, for writer.
> > > > > > 99 */
> > > > > > 100 static __inline__
> > > > > > 101 void ring_buffer_clear_noref_flag(struct ring_buffer_backend *bufb,
> > > > > > 102 unsigned long idx)
> > > > > > 103 {
> > > > > > 104 struct ring_buffer_backend_page *sb_pages, *new_sb_pages;
> > > > > > 105
> > > > > > 106 sb_pages = bufb->buf_wsb[idx].pages;
> > > > > > 107 for (;;) {
> > > > > > 108 if (!RCHAN_SB_IS_NOREF(sb_pages))
> > > > > > 109 return; /* Already writing to this buffer */
> > > > > > 110 new_sb_pages = sb_pages;
> > > > > > 111 RCHAN_SB_CLEAR_NOREF(new_sb_pages);
> > > > > > 112 new_sb_pages = cmpxchg(&bufb->buf_wsb[idx].pages,
> > > > > > 113 sb_pages, new_sb_pages);
> > > > > > 114 if (likely(new_sb_pages == sb_pages))
> > > > > > 115 break;
> > > > > > 116 sb_pages = new_sb_pages;
> > > > >
> > > > > The writer calls this??
> > > >
> > > > Yes. But the common case (for each event) is simply a
> > > > "if (!RCHAN_SB_IS_NOREF(sb_pages))" test that returns. The cmpxchg() is only
> > > > performed at subbuffer boundary.
> > >
> > > Is the cmpxchg only contending with other writers?
> >
> > No. Would have this been the case, I would have used a cmpxchg_local(). This
> > cmpxchg used to deal with subbuffer swap is touching the subbuffer "pages"
> > pointer, which can be updated concurrently by other writers as well as readers.
> >
> > The writer clears the noref flags when starting to write in a subbuffers, and
> > sets it when delivering the subbuffer (when it is fully committed).
> >
> > The reader can only ever swap the subbuffer with the one it owns if the noref
> > flag is set. The reader uses a cmpxchg() too to perform the swap.
> >
>
> So the writer can loop because of a reader? And theoretically, a
> aggressive reader that is constantly swapping pages can halt a writer?

Hehe, I'm glad you ask :) This has been one of my major concerns with this
algorithm.

Taken from my thesis:

8.6 Formal Verification of LTTng
8.6.3 Real-Time Impact

"The second synchronization added by the overwrite mode involves letting the
reader thread exchange individual sub-buffers with its own extra sub-buffer
before reading them. This exchange is performed with a CAS instruction which
verified if the use flag is set by the writer. If the sub-buffer is in use, the
reader will retry later. On the writer side, a CAS instruction is used in a
busy-loop to set and clear the use flag (for portability, because an atomic
instruction to set a mask is not available for all architecture in the Linux
kernel). Given that the reader can only ever exchange a sub-buffer and cause the
writer to restart once (further accesses to that sub-buffers are impossible,
because the reader will block on the writer position), the reader thread cannot
starve the writer."

IOW, if the reader is trying to get subbuffers like crazy, it will maybe cause
one loop restart when taking the subbuffer the writer was about to write to, but
after that, the reader will be forced to exchange the next subbuffer, and so on
until it wraps-around and comes back to the subbuffer preceding the writer
position, as stop there. Note that during all this execution, the reader could
only cause a single retry of the writer. In summary, the frontend range
restrictions guarantee that the writer will retry at most once.

Note that we could also use an atomic "set bit/clear bit" for the writer, along
with write memory barriers, when available on the architecture, and that would
remove all need for loops. However, architectures that need to emulate set
bit/clear bit with a cmpxchg() would behave pretty much in the same way as my
current code does.

>
> I thought you said you did not have loops in the writers?

Well, while technically a loop, it is a bounded single-retry, so it's OK as far
as wait-free guarantees are concerned. We could even put a WARN_ON() there to
ensure that we catch double retry that should never happen, but that would be at
the cost of a runtime veritication. A DEBUG_TRACER_RT config option could select
this though.

Thanks,

Mathieu

--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/