[PATCH tip/core/rcu 0/22] v2 Improvements to rcu_barrier() and RTresponse on big systems

From: Paul E. McKenney
Date: Fri Jun 22 2012 - 11:27:08 EST


Hello!

This patch series contains improvements to the rcu_barrier() family
of primitives and to latency for large systems. These are in a
single series due to conflicts that would otherwise occur. This is an
update from version 1 posted at: https://lkml.org/lkml/2012/6/15/509.
The individual patches are as follows:

1. Allow the value for RCU_FANOUT_LEAF to be increased (but not
decreased!) via a boot-time parameter, in turn allowing a
default kernel build to be adjusted for low RCU grace-period
initialization latency on a large system.
2. Stop flagging a four-level rcu_node hierarchy as "experimental".
3. Work around the new default NR_CPUS=4096 by checking the
boot-time-computed nr_cpu_ids, allowing this to override
NR_CPUS. This again reduces RCU grace-period initialization
latency for kernels built with large NR_CPUS running on small
systems.
4. Shrink a macro argument to keep lines under 80 characters.
5. Add a pointer in the rcu_state structure to the corresponding
member of the call_rcu() family of functions in preparation
for increasing rcu_barrier() concurrency.
6. Move _rcu_barrier()'s rcu_head structures to the per-CPU
per-RCU-flavor rcu_data structures so that different flavors
of rcu_barrier() do not need to contend for the rcu_head
structures.
7. Move rcu_barrier()'s rcu_barrier_cpu_count global variable to
a new ->barrier_cpu_count field in the rcu_state structure, so
that different flavors of rcu_barrier() do not need to contend
for this variable.
8. Move rcu_barrier()'s rcu_barrier_completion global variable to
a new ->barrier_completion field in the rcu_state structure, so
that different flavors of rcu_barrier() do not need to contend
for this variable.
9. Move rcu_barrier()'s rcu_barrier_mutex global variable to
a new ->barrier_mutex field in the rcu_state structure, so that
different flavors of rcu_barrier() do not need to contend for
this variable.
10. Remove redundant initialization to zero of the rcu_state structure's
->n_force_qs and ->n_force_qs_ngp fields.
11. Introduce counter scheme to allow multiple concurrent executions
of a given flavor of rcu_barrier() to share work.
12. Add event tracing for _rcu_barrier().
13. Add debugfs tracing for _rcu_barrier().
14. Remove unnecessary per-CPU variable argument from
__rcu_process_callbacks().
15. Introduce for_each_rcu_flavor() iterator and use it. This provides
a nicer way to iterate through the RCU flavors to do per-flavor
processing.
16. Apply the for_each_rcu_flavor() iterator to debugfs tracing.
17. Remove dead-code gcc helper from code that is no longer ever dead.
18. Move RCU grace-period initialization into a kthread.
19. Allow RCU grace-period initialization to be preempted, including
cond_resched() preemption points for CONFIG_PREEMPT=n systems.
20. Move RCU grace-period cleanup into a kthread.
21. Allow RCU grace-period cleanup to be preempted, including
cond_resched() preemption points for CONFIG_PREEMPT=n systems.
22. Prevent offline CPUs from executing RCU core code.

Changes from version 1:

o Apply review comments from Josh Triplett and Steven Rostedt.
o Added patches 18-22 to reduce scheduling-latency spikes from
RCU initialization and cleanup on large systems.

Thanx, Paul

b/Documentation/kernel-parameters.txt | 5
b/include/trace/events/rcu.h | 45 ++
b/kernel/rcutree.c | 97 +++++
b/kernel/rcutree.h | 23 -
b/kernel/rcutree_plugin.h | 2
b/kernel/rcutree_trace.c | 2
kernel/rcutree.c | 555 +++++++++++++++++++++-------------
kernel/rcutree.h | 25 +
kernel/rcutree_plugin.h | 128 -------
kernel/rcutree_trace.c | 155 +++++----
10 files changed, 604 insertions(+), 433 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/