Re: [RELEASE] Userspace RCU 0.3.0

From: Paul E. McKenney
Date: Wed Nov 04 2009 - 01:23:23 EST


On Tue, Nov 03, 2009 at 11:53:14AM -0500, Mathieu Desnoyers wrote:
> * Paul E. McKenney (paulmck@xxxxxxxxxxxxxxxxxx) wrote:
> > On Tue, Nov 03, 2009 at 10:02:34AM -0500, Mathieu Desnoyers wrote:
> > > Hi everyone,
> > >
> > > I released userspace RCU 0.3.0, which includes a small API change for
> > > the "deferred work" interface. After discussion with Paul, I decided to
> > > drop the support for call_rcu() and only provide defer_rcu(), to make
> > > sure I don't provide an API with the same name as the kernel RCU but
> > > with different arguments and semantic. It will generate the following
> > > linker error if used:
> > >
> > > file.c:240: undefined reference to
> > > `__error_call_rcu_not_implemented_please_use_defer_rcu'
> > >
> > > Note that defer_rcu() should *not* be used in RCU read-side C.S.,
> > > because it calls synchronize_rcu() if the queue is full. This is a major
> > > distinction from call_rcu(). (note to self: eventually we should add
> > > some self-check code to detect defer_rcu() nested within RCU read-side
> > > C.S.).
> > >
> > > I plan to eventually implement a proper call_rcu() within the userspace
> > > RCU library. It's not, however, a short-term need for me at the moment.
> >
> > I can tell that we need to get you going on some real-time work. ;-)
>
> :-)
>
> > (Sorry, but I really couldn't resist!)
>
> It's true that it becomes important when real-time behavior is required
> at the call_rcu() execution site. However, even typical use of
> call_rcu() has some limitations in this area: in a scenario where the
> struct rcu_head passed to call_rcu() is allocated dynamically, kmalloc
> and friends do not offer any kind of wait-free/lock-free guarantees. So
> the way call_rcu() works is to push the burden of RT impact on the
> original struct rcu_head allocation. But I agree that it makes
> out-of-memory/queue full error handling much easier, because all the
> allocation is done at the same site.
>
> The main disadvantage of the call_rcu() approach though is that I cannot
> see any clean way to perform call_rcu() rate-limitation on a per-cpu
> basis. This would basically imply that we have to stop providing RT
> call_rcu() at some point to ensure we do not go over a certain
> threshold.

Or that we use other means to accelerate the grace period when any given
CPU starts getting filled up, such as force_quiescent_state(). Now,
force_quiescent_state() is not exactly lightweight, but at that point,
we should not be all that concerned about incurring some extra overhead.

Now, an RCU read-side critical section might take forever, but then
you are stuck no matter what you do. And this is why SRCU has a
separate API that does not include a call_srcu().

> A possible solution would be to make call_rcu() return an error when it
> goes over some threshold. The caller would have to deal with the error,
> possibly by rejecting the whole operation (so maybe another CPU/cloud
> node could take over the work). This seems cleaner than delaying
> execution of the call_rcu() site. The caller could actually decide to
> either reject the whole operation or to delay its execution.

That sort of error handling usually turns out to be surprisingly
complex, difficult to test, and prone to bugs. Having a deterministic
call_rcu() that avoids error returns is actually quite valuable.

The problem in user mode is that you cannot guarantee that a given
thread won't get preempted for an extended time period. One approach
would be to make call_rcu() provide a conditional guarantee, so
that it (for example) provides deterministic execution time only
if readers are getting done in a timely manner and if the call_rcu()
rate is bounded. But even that would prohibit call_rcu() from being
invoked from within an RCU read-side critical section.

So another approach is to test whether call_rcu() is being invoked
from within an RCU read-side critical section, and only block if
not. And yet a another would be for call_rcu() to block for a fixed
time period if within an RCU read-side critical section. Either way,
the system would make forward progress if at least -some- of the
call_rcu() invocations were from outside of RCU read-side critical
sections.

Thanx, Paul

> Mathieu
>
>
> > Thanx, Paul
>
> --
> Mathieu Desnoyers
> OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/