[PATCH 07/11] rcu: set special.b.deferred_qs before wake_up()

From: Lai Jiangshan
Date: Thu Oct 31 2019 - 06:08:45 EST


The patch 10f39bb1b2c1
(rcu: protect __rcu_read_unlock() against scheduler-using irq handlers)
unveiled a kind of deadlock and resolved the deadlock problem by
avoiding the condition when ->rcu_read_lock_nesting is zero &&
->rcu_read_unlock_special is non-zero. To achieve it, the commit
used negative values for ->rcu_read_lock_nesting.

But now we have deferred_qs mechanism, we can defer qs rather
than persevere in reporting qs and deadlock. All we need is
setting special.b.deferred_qs before scheduler locks
such as wake_up() and leave the qs deferred and return.

After this change, rcu_read_unlock_special() is safe to be
called in any context, including nested in __rcu_read_unlock()
in interrupt.

This change is important to change ->rcu_read_lock_nesting
back to non-negative and further simplify the rcu_read_unlock().

Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
---
kernel/rcu/tree_plugin.h | 15 ++++++++++++---
1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index e612c77dc446..dbded2b8c792 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -591,6 +591,7 @@ static void rcu_read_unlock_special(struct task_struct *t)
irqs_were_disabled = irqs_disabled_flags(flags);
if (preempt_bh_were_disabled || irqs_were_disabled) {
bool exp;
+ bool deferred_qs = t->rcu_read_unlock_special.b.deferred_qs;
struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
struct rcu_node *rnp = rdp->mynode;

@@ -599,9 +600,18 @@ static void rcu_read_unlock_special(struct task_struct *t)
(rdp->grpmask & rnp->expmask) ||
tick_nohz_full_cpu(rdp->cpu);
// Need to defer quiescent state until everything is enabled.
+ // In some cases when in_interrupt() returns false,
+ // raise_softirq_irqoff() has to call wake_up(),
+ // and the !deferred_qs says that scheduler locks
+ // cannot be held, so the wakeup will be safe now.
+ // But this wake_up() may have RCU critical section nested
+ // in the scheduler locks and its rcu_read_unlock() would
+ // call rcu_read_unlock_special() and then wake_up()
+ // recursively and deadlock if deferred_qs is still false.
+ // To avoid it, deferred_qs has to be set beforehand.
+ t->rcu_read_unlock_special.b.deferred_qs = true;
if (irqs_were_disabled && use_softirq &&
- (in_interrupt() ||
- (exp && !t->rcu_read_unlock_special.b.deferred_qs))) {
+ (in_interrupt() || (exp && !deferred_qs))) {
// Using softirq, safe to awaken, and we get
// no help from enabling irqs, unlike bh/preempt.
raise_softirq_irqoff(RCU_SOFTIRQ);
@@ -620,7 +630,6 @@ static void rcu_read_unlock_special(struct task_struct *t)
irq_work_queue_on(&rdp->defer_qs_iw, rdp->cpu);
}
}
- t->rcu_read_unlock_special.b.deferred_qs = true;
local_irq_restore(flags);
return;
}
--
2.20.1