[RFC v2.1 1/5] rcu: Introduce for_each_leaf_node_cpu()

From: Boqun Feng
Date: Thu Dec 15 2016 - 10:21:38 EST


There are some places inside RCU core, where we need to iterate all mask
(->qsmask, ->expmask, etc) bits in a leaf node, in order to iterate all
corresponding CPUs. The current code iterates all possible CPUs in this
leaf node and then checks with the mask to see whether the bit is set.

However, given the fact that most bits in cpu_possible_mask are set but
rare bits in an RCU leaf node mask are set(in other words, ->qsmask and
its friends are usually more sparse than cpu_possible_mask), it's better
to iterate in the other way, that is iterating mask bits in a leaf node.
By doing so, we can save several checks in the loop, moreover, that fast
path checking(e.g. ->qsmask == 0) could then be consolidated into the
loop logic.

This patch introduce for_each_leaf_node_cpu() to iterate mask bits in a
more efficient way.

By design, The CPUs whose bits are set in the leaf node masks should be
a subset of possible CPUs, so we don't need extra check with
cpu_possible(), however, a WARN_ON_ONCE() is put to check whether there
are some nasty cases we miss, and we skip that "impossible" CPU in that
case.

Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
---
kernel/rcu/tree.h | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)

diff --git a/kernel/rcu/tree.h b/kernel/rcu/tree.h
index c0a4bf8f1ed0..b35da5b5dab1 100644
--- a/kernel/rcu/tree.h
+++ b/kernel/rcu/tree.h
@@ -295,6 +295,25 @@ struct rcu_node {
cpu <= rnp->grphi; \
cpu = cpumask_next((cpu), cpu_possible_mask))

+
+#define MASK_BITS(mask) (BITS_PER_BYTE * sizeof(mask))
+/*
+ * Iterate over all CPUs a leaf RCU node which are still masked in
+ * @mask.
+ *
+ * Note @rnp has to be a leaf node and @mask has to belong to @rnp. And we
+ * assume that no CPU is masked in @mask but not set in cpu_possible_mask. IOW,
+ * masks of a leaf node never set a bit for an "impossible" CPU.
+ */
+#define for_each_leaf_node_cpu(rnp, mask, cpu) \
+ for ((cpu) = (rnp)->grplo + find_first_bit(&(mask), MASK_BITS(mask)); \
+ (cpu) <= (rnp)->grphi; \
+ (cpu) = (rnp)->grplo + find_next_bit(&(mask), MASK_BITS(mask), \
+ (cpu) - (rnp)->grplo + 1)) \
+ if (WARN_ON_ONCE(!cpu_possible(cpu))) \
+ continue; \
+ else
+
/*
* Union to allow "aggregate OR" operation on the need for a quiescent
* state by the normal and expedited grace periods.
--
2.10.2