Re: [PATCH tip/core/rcu 04/13] rcu: Make RCU_FANOUT_LEAF help text more explicit about skew_tick
From: Peter Zijlstra
Date: Wed Apr 19 2017 - 09:49:00 EST
On Wed, Apr 19, 2017 at 03:22:26PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 13, 2017 at 11:42:32AM -0700, Paul E. McKenney wrote:
>
> > I believe that you are missing the fact that RCU grace-period
> > initialization and cleanup walks through the rcu_node tree breadth
> > first, using rcu_for_each_node_breadth_first().
>
> Indeed. That is the part I completely missed.
>
> > This macro (shown below)
> > implements this breadth-first walk using a simple sequential traversal of
> > the ->node[] array that provides the structures making up the rcu_node
> > tree. As you can see, this scan is completely independent of how CPU
> > numbers might be mapped to rcu_data slots in the leaf rcu_node structures.
>
> So this code is clearly not a hotpath, but still its performance
> matters?
>
> Seems like you cannot win here :/
So I sort of see what that code does, but I cannot quite grasp from the
comments near there _why_ it is doing this.
My thinking is that normal (active CPUs) will update their state at tick
time through the tree, and once the state reaches the root node, IOW all
CPUs agree they've observed that particular state, we advance the global
state, rinse repeat. That's how tree-rcu works.
NOHZ-idle stuff would be excluded entirely; that is, if we're allowed to
go idle we're up-to-date, and completely drop out of the state tracking.
When we become active again, we can simply sync the CPU's state to the
active state and go from there -- ignoring whatever happened in the
mean-time.
So why do we have to do machine wide updates? How can we get at the end
up a grace period without all CPUs already agreeing that its complete?
/me puzzled.