Re: [PATCH, RFC, tip/core/rcu] scalable classic RCU implementation

From: Manfred Spraul
Date: Sun Aug 24 2008 - 14:25:30 EST


Paul E. McKenney wrote:
+ */
+struct rcu_node {
+ spinlock_t lock;
+ unsigned long qsmask; /* CPUs or groups that need to switch in */
+ /* order for current grace period to proceed.*/
+ unsigned long qsmaskinit;
+ /* Per-GP initialization for qsmask. */
I'm not sure if a bitmap is the right storage. If I understand the code correctly, it contains two information:
1) If the bitmap is clear, then all cpus have completed whatever they need to do.
A counter is more efficient than a bitmap. Especially: It would allow to choose the optimal fan-out, independent from 32/64 bits.
2) The information if the current cpu must do something to complete the current period.non
This is a local information, usually (always?) only the current cpu needs to know if it must do something.
But this doesn't need to be stored in a shared structure, the information could be stored in a per-cpu structure.

I am using the bitmap in force_quiescent_state() to work out who to
check dynticks and who to send reschedule IPIs to. I could scan all
of the per-CPU rcu_data structures, but am assuming that after a few
jiffies there would typically be relatively few CPUs still needing to do
a quiescent state. Given this assumption, on systems with large numbers
of CPUs, scanning the bitmask greatly reduces the number of cache misses
compared to scanning the rcu_data structures.

It's an optimization question: What is rarer? force_quiescent_state() or "normal" cpu_quiet calls.
You have optimized for force_quiescent_state(), I have optimized for "normal" cpu_quiet calls. [ok, I admit: force_quiescent_state() is still missing in my code].
Do you have any statistics?

--
Manfred
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/