Re: [PATCH tip/core/rcu 40/40] srcu: Parallelize callback handling
From: Peter Zijlstra
Date: Thu Apr 13 2017 - 05:54:47 EST
On Wed, Apr 12, 2017 at 10:40:25AM -0700, Paul E. McKenney wrote:
> Peter Zijlstra proposed using SRCU to reduce mmap_sem contention [1],
> however, there are workloads that could result in a high volume of
> concurrent invocations of call_srcu(), which with current SRCU would
> result in excessive lock contention on the srcu_struct structure's
> ->queue_lock, which protects SRCU's callback lists. This commit therefore
> moves SRCU to per-CPU callback lists, thus greatly reducing contention.
>
> Because a given SRCU instance no longer has a single centralized callback
> list, starting grace periods and invoking callbacks each require a bit
> more work. These are handled using an srcu_node tree that is in some ways
> similar to the rcu_node trees used by RCU-bh, RCU-preempt, and RCU-sched
> (for example, the srcu_node tree shape is controlled by exactly the
> same Kconfig options and boot parameters that control the shape of the
> rcu_node tree).
>
> In addition, the old per-CPU srcu_array structure is now named srcu_data
> and contains an rcu_segcblist structure named ->srcu_cblist for its
> callbacks (and a spinlock to protect this). The srcu_struct gets
> an srcu_gp_seq that is used to associate callback segments with the
> corresponding completion-time grace-period number. These completion-time
> grace-period numbers are propagated up the srcu_node tree so that the
> grace-period workqueue handler can determine whether additional grace
> periods are needed on the one hand and where to look for callbacks that
> are ready to be invoked.
>
> The srcu_barrier() function must now wait on all instances of the
> per-CPU ->srcu_cblist. Because each ->srcu_cblist is protected
> by ->lock, srcu_barrier() can remotely add the needed callbacks.
> In theory, it could also remotely start grace periods, but this gets
> complex and racy. And interestingly enough, it is never necessary to
> start a grace period in this case because srcu_barrier() only enqueues
> a callback when a callback is already present. And a grace period has
> to have already been started for this pre-existing callback. And it is
> only the callback that srcu_barrier() needs to wait on, not any particular
> grace period. Therefore, a new rcu_segcblist_entrain() function enqueues
> the srcu_barrier() function's callback into the same segment occupied by
> the pre-existing callback. The special case where all the pre-existing
> callbacks are on a different list being invoked is handled by enqueuing
> srcu_barrier()'s callback into the RCU_DONE_TAIL segment, relying on
> the done-callbacks check that takes place after all callbacks are inovked.
>
> Note that the readers use the same algorithm as before. Note that there
> is a separate srcu_idx that tells the readers what counter to increment.
> This unfortunately cannot be combined with srcu_gp_seq because they
> need to be incremented at different times.
So one thing I've asked before I think, would it not be possible to
abstract PREEMPT_RCU and use the exact same code for PREEMPT_RCU and
SRCU ?