Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux(repost)

From: Mathieu Desnoyers
Date: Wed Feb 11 2009 - 03:59:18 EST


* Lai Jiangshan (laijs@xxxxxxxxxxxxxx) wrote:
> Mathieu Desnoyers wrote:
> >
> > I just did a mb() version of the urcu :
> >
> > (uncomment CFLAGS=+-DDEBUG_FULL_MB in the Makefile)
> >
> > Time per read : 48.4086 cycles
> > (about 6-7 times slower, as expected)
> >
>
> I had read many papers of Paul.
> (http://www.rdrop.com/users/paulmck/RCU/)
> and I know Paul did his endeavor to remove memory barrier in
> RCU read site in kernel. His work is of consequence.
>
> But, I think,
> 1) Userspace RCU's read site can pay for the latency of
> memory barrier(include atomic operator).
> Userspace does not access to shared data so frequently as kernel.
> and userspace's read site is not so fast as kernel.
>
> 2) Userspace uses RCU is for RCU's excellence, not saving a little cpu cycles
> (http://lwn.net/Articles/263130/)
> One of the most important excellence is lock-free.
>
>
> If my thinking is right, the following opinion has some meaning too.
>
> Use All-SYSTEM 's RCU for Userspace RCU.
>
> All-SYSTEM 's RCU is QRCU which is implemented by Paul.
> http://lwn.net/Articles/223752/
>
> Any system which has mechanisms equivalent to atomic_op,
> __wait_event, wake_up, mutex, This system can also implement QRCU.
> So most system can implement QRCU, and I say QRCU is All-SYSTEM 's RCU.
>
> Obviously, we can implement a portable QRCU highly simply in NPTL.
> and read lock is:
> for (;;) {
> int idx = qp->completed & 0x1;
> if (likely(atomic_inc_not_zero(qp->ctr + idx)))
> return idx;
> }
> "atomic_inc_not_zero" is called once likely, it's fast enough.
>

Hi Lai,

There are a few reasons why we need rcu in userspace for tracing :

- We need very fast per-cpu read-side synchronization for data structure
handling. Updates are rare (enabling/disabling tracing). Therefore,
your argument about userspace not needing "fast" rcu does not hold in
this case. Note that LTTng has the performance it has today in the
kernel because I made sure to use no memory barriers when unnecessary
and because I used the minimal amount of atomic operations required.
Those represent costly synchronization primitives on quite a few
architectures.
- Being lock-free (atomic). To trace code executed in signal handlers,
we need to be able to nest over any user code. With the solution you
propose above, the busy-loop in the read-lock does not seems to be
signal-safe : if it nests over a writer, it could busy-loop forever.

Mathieu

> Lai.
>
>
>

--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/