[PATCH tip/core/rcu 09/14] rcu: Speed up expedited GPs when interrupting RCU reader

From: Paul E. McKenney
Date: Sun Nov 11 2018 - 14:29:57 EST


In PREEMPT kernels, an expedited grace period might send an IPI to a
CPU that is executing an RCU read-side critical section. In that case,
it would be nice if the rcu_read_unlock() directly interacted with the
RCU core code to immediately report the quiescent state. And this does
happen in the case where the reader has been preempted. But it would
also be a nice performance optimization if immediate reporting also
happened in the preemption-free case.

This commit therefore adds an ->exp_hint field to the task_struct structure's
->rcu_read_unlock_special field. The IPI handler sets this hint when
it has interrupted an RCU read-side critical section, and this causes
the outermost rcu_read_unlock() call to invoke rcu_read_unlock_special(),
which, if preemption is enabled, reports the quiescent state immediately.
If preemption is disabled, then the report is required to be deferred
until preemption (or bottom halves or interrupts or whatever) is re-enabled.

Because this is a hint, it does nothing for more complicated cases. For
example, if the IPI interrupts an RCU reader, but interrupts are disabled
across the rcu_read_unlock(), but another rcu_read_lock() is executed
before interrupts are re-enabled, the hint will already have been cleared.
If you do crazy things like this, reporting will be deferred until some
later RCU_SOFTIRQ handler, context switch, cond_resched(), or similar.

Reported-by: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
Acked-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
---
include/linux/sched.h | 4 +++-
kernel/rcu/tree_exp.h | 4 +++-
kernel/rcu/tree_plugin.h | 14 +++++++++++---
3 files changed, 17 insertions(+), 5 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index a51c13c2b1a0..e4c7b6241088 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -572,8 +572,10 @@ union rcu_special {
struct {
u8 blocked;
u8 need_qs;
+ u8 exp_hint; /* Hint for performance. */
+ u8 pad; /* No garbage from compiler! */
} b; /* Bits. */
- u16 s; /* Set of bits. */
+ u32 s; /* Set of bits. */
};

enum perf_event_task_context {
diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index e669ccf3751b..928fe5893a57 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -692,8 +692,10 @@ static void sync_rcu_exp_handler(void *unused)
*/
if (t->rcu_read_lock_nesting > 0) {
raw_spin_lock_irqsave_rcu_node(rnp, flags);
- if (rnp->expmask & rdp->grpmask)
+ if (rnp->expmask & rdp->grpmask) {
rdp->deferred_qs = true;
+ WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
+ }
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
}

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 05915e536336..618956cc7a55 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -642,13 +642,21 @@ static void rcu_read_unlock_special(struct task_struct *t)

local_irq_save(flags);
irqs_were_disabled = irqs_disabled_flags(flags);
- if ((preempt_bh_were_disabled || irqs_were_disabled) &&
- t->rcu_read_unlock_special.b.blocked) {
+ if (preempt_bh_were_disabled || irqs_were_disabled) {
+ WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
/* Need to defer quiescent state until everything is enabled. */
- raise_softirq_irqoff(RCU_SOFTIRQ);
+ if (irqs_were_disabled) {
+ /* Enabling irqs does not reschedule, so... */
+ raise_softirq_irqoff(RCU_SOFTIRQ);
+ } else {
+ /* Enabling BH or preempt does reschedule, so... */
+ set_tsk_need_resched(current);
+ set_preempt_need_resched();
+ }
local_irq_restore(flags);
return;
}
+ WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, false);
rcu_preempt_deferred_qs_irqrestore(t, flags);
}

--
2.17.1