Re: [PATCH, RFC] v4 scalable classic RCU implementation

From: Manfred Spraul
Date: Sun Sep 21 2008 - 07:09:30 EST


Hi Paul,

Some further thoughts about design differences between your and my implementation:

- rcutree's qsmaskinit is the worst-case list of cpus that could be in rcu read side critical sections.
- rcustate's cpu_total is the accurate list of cpus that could be in rcu read side critical sections.

Both variables are read rarely: for rcu_state, twice per grace period.

rcutree fixes up cpus that are "incorrectly" listed in qsmaskinit with force_quiescent_state(). It forces rcutree to use a cpu bitmask for qsmask and it forces rcutree to store the "done" information in a global structure. Additionately, in the worst case force_quiescent_state() must loop over all cpus.
rcustate can use per-cpu structures and a global atomic_t. There is no loop over all cpus. That's a big advantage, thus I think it's worth the effort to maintain an accurate list.
Unfortunately, I don't have an efficient implementation for the accurate list.

Some random ideas:
- cpu_total is only read rarely. Thus it would be ok if the read operation is expensive [e.g. collect data from multiple cachelines, acquire spinlocks...]
- updates to cpu_total happen with every interrupt on an idle system with no_hz.
Thus it must be very scalable, preferably per-cpu data.
And: Updates are far more frequent than grace periods.
- updates to cpu_total happen nearly never without no_hz.
Especially: far less frequent than grace periods.

What about adding an "invalid" flag to cpu_total? The "real" data is stored in per-cpu structures.
- when a cpu enters/leaves nohz, then it invalidates the global cpu_total and updates a per-cpu structure
- when the state machine needs the number of rcu-tracked cpus, then it checks if the global cpu_total is valid.
If it's valid, then cpu_total is used directly. Otherwise the per-cpu structures are enumerated and the new value is stored as cpu_total.

What do you think?

--
Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/