Re: [RFC PATCH v8 1/9] Restartable sequences system call

From: Boqun Feng
Date: Mon Aug 29 2016 - 22:01:36 EST

On Mon, Aug 29, 2016 at 03:16:52PM +0000, Mathieu Desnoyers wrote:
> ----- On Aug 27, 2016, at 12:22 AM, Josh Triplett josh@xxxxxxxxxxxxxxxx wrote:
> > On Thu, Aug 25, 2016 at 05:56:25PM +0000, Ben Maurer wrote:
> >> rseq opens up a whole world of algorithms to userspace â algorithms
> >> that are O(num CPUs) and where one can have an extremely fast fastpath
> >> at the cost of a slower slow path. Many of these algorithms are in use
> >> in the kernel today â per-cpu allocators, RCU, light-weight reader
> >> writer locks, etc. Even in cases where these APIs can be implemented
> >> today, a rseq implementation is often superior in terms of
> >> predictability and usability (eg per-thread counters consume more
> >> memory and are more expensive to read than per-cpu counters).
> >>
> >> Isnât the large number of uses of rseq-like algorithms in the kernel a
> >> pretty substantial sign that there would be demand for similar
> >> algorithms by user-space systems programmers?
> >
> > Yes and no. It provides a substantial sign that such algorithms could
> > and should exist; however "someone should do this" doesn't demonstrate
> > that someone *will*. I do think we need a concrete example of a
> > userspace user with benchmark numbers that demonstrate the value of this
> > approach.
> >
> > Mathieu, do you have a version of URCU that can use rseq to work per-CPU
> > rather than per-thread? URCU's data structures would work as a
> > benchmark.
> I currently don't have a per-cpu flavor of liburcu. All the flavors are
> per-thread, because currently the alternative requires atomic operations
> on the fast-path. We could indeed re-implement something similar to SRCU
> (although under LGPLv2.1 license). I've looked at what would be required
> over the weekend, and it seems feasible, but in the short term my customers
> expect me to focus my work on speeding up the LTTng-UST tracer per-cpu
> ring buffer by adapting it to rseq. Completing the liburcu per-cpu flavor
> will be in my spare time for now.

Just for you information.

I have been working on the new SRCU-like flavor of liburcu since last
week, but it took me a while to understand the directory architecture of

I wrote only implemetion for rcu_read_{un}lock() and synchronize_rcu(),
and just is able to run the simplest multiflavor test case. My plan is
to post the code and some numbers(on x86 and ppc) by the end of this


> I expect liburcu per-cpu flavor to improve the slow path in many-threads
> use-cases (smaller grace period overhead), but not the fast path much,
> except perhaps by allowing faster memory reclaim in update-heavy workloads,
> which could then lead to better use of the cache even for reads.


