Re: [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")
From: Paul E. McKenney
Date: Wed Sep 04 2019 - 19:13:53 EST
On Wed, Sep 04, 2019 at 09:54:20AM -0400, Joel Fernandes wrote:
> On Wed, Sep 04, 2019 at 03:12:10AM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 04, 2019 at 12:59:10AM -0400, Joel Fernandes wrote:
> > > On Tue, Sep 03, 2019 at 01:02:49PM -0700, Paul E. McKenney wrote:
[ . . . ]
> > If this task gets delayed betweentimes, rcu_implicit_dynticks_qs() would
> > fail to set .rcu_need_heavy_qs because it saw it already being set,
> > even though the corresponding ->dynticks update had already happened.
> > (It might be a new grace period, given that the old grace period might
> > have ended courtesy of the atomic_add_return().)
>
> Makes sense and I agree.
>
> Also, I would really appreciate if you can correct the nits in the above
> patch we're reviewing, and apply them (if you can).
> I think, there are only 2 changes left:
> - rename special to seq.
> - reorder the rcu_need_heavy_qs write.
>
> On a related point, when I was working on the NOHZ_FULL testing I noticed a
> weird issue where rcu_urgent_qs was reset but rcu_need_heavy_qs was still
> set indefinitely. I am a bit afraid our hints are not being cleared
> appropriately and I believe I fixed a similar issue a few months ago. I
> would rather have them cleared once they are no longer needed. What do you
> think about the below patch? I did not submit it yet because I was working
> on other patches.
>
> ---8<-----------------------
>
> From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> Subject: [RFC] rcu/tree: Reset CPU hints when reporting a quiescent state
>
> While tracing, I am seeing cases where need_heavy_qs is still set even
> though urgent_qs was cleared, after a quiescent state is reported. One
> such case is when the softirq reports that a CPU has passed quiescent
> state.
>
> Previously in 671a63517cf9 ("rcu: Avoid unnecessary softirq when system
> is idle"), I had fixed a bug where core_needs_qs was not being cleared.
> I worry we keep running into similar situations. Let us just add a
> function to clear hints and call it from all relevant places to make the
> code more robust and avoid such stale hints which could in theory at
> least, cause false hints after the quiescent state was already reported.
>
> Tested overnight with rcutorture running for 60 minutes on all
> configurations of RCU.
>
> Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> ---
> kernel/rcu/tree.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
Excellent point! But how about if we combine it with the existing
disabling of the scheduler tick, perhaps something like the following?
Note that the FQS clearing can come from some other CPU, hence the added
{READ,WRITE}_ONCE() calls. The call is moved down in rcu_report_qs_rdp()
because something would have had to clear the bit to prevent execution
from getting there, and I believe that the other bit-clearing events
have calls to rcu_disable_urgency_upon_qs(). (But I easily could have
missed something!)
I am OK leaving RCU urgency set on offline CPUs, hence clearing things
at online time.
Thanx, Paul
------------------------------------------------------------------------
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index 68ebf0eb64c8..2b74b6c94086 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -827,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
incby = 1;
} else if (tick_nohz_full_cpu(rdp->cpu) &&
rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
- rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
+ READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) {
rdp->rcu_forced_tick = true;
tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
}
@@ -892,11 +892,15 @@ void rcu_irq_enter_irqson(void)
}
/*
- * If the scheduler-clock interrupt was enabled on a nohz_full CPU
- * in order to get to a quiescent state, disable it.
+ * If any sort of urgency was applied to the current CPU (for example,
+ * the scheduler-clock interrupt was enabled on a nohz_full CPU) in order
+ * to get to a quiescent state, disable it.
*/
-void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
+void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
{
+ WRITE_ONCE(rdp->core_needs_qs, false);
+ WRITE_ONCE(rdp->rcu_urgent_qs, false);
+ WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
rdp->rcu_forced_tick = false;
@@ -1417,7 +1421,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
need_gp = !!(rnp->qsmask & rdp->grpmask);
rdp->cpu_no_qs.b.norm = need_gp;
- rdp->core_needs_qs = need_gp;
+ WRITE_ONCE(rdp->core_needs_qs, need_gp);
zero_cpu_stall_ticks(rdp);
}
rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
@@ -1987,7 +1991,6 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
return;
}
mask = rdp->grpmask;
- rdp->core_needs_qs = false;
if ((rnp->qsmask & mask) == 0) {
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
} else {
@@ -1998,7 +2001,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
if (!offloaded)
needwake = rcu_accelerate_cbs(rnp, rdp);
- rcu_disable_tick_upon_qs(rdp);
+ rcu_disable_urgency_upon_qs(rdp);
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
/* ^^^ Released rnp->lock */
if (needwake)
@@ -2022,7 +2025,7 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
* Does this CPU still need to do its part for current grace period?
* If no, return and let the other CPUs do their part as well.
*/
- if (!rdp->core_needs_qs)
+ if (!READ_ONCE(rdp->core_needs_qs))
return;
/*
@@ -2316,7 +2319,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
rdp = per_cpu_ptr(&rcu_data, cpu);
if (f(rdp)) {
mask |= bit;
- rcu_disable_tick_upon_qs(rdp);
+ rcu_disable_urgency_upon_qs(rdp);
}
}
}
@@ -3004,7 +3007,7 @@ static int rcu_pending(void)
return 0;
/* Is the RCU core waiting for a quiescent state from this CPU? */
- if (rdp->core_needs_qs && !rdp->cpu_no_qs.b.norm)
+ if (READ_ONCE(rdp->core_needs_qs) && !rdp->cpu_no_qs.b.norm)
return 1;
/* Does this CPU have callbacks ready to invoke? */
@@ -3244,7 +3247,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
rdp->gp_seq = rnp->gp_seq;
rdp->gp_seq_needed = rnp->gp_seq;
rdp->cpu_no_qs.b.norm = true;
- rdp->core_needs_qs = false;
rdp->rcu_iw_pending = false;
rdp->rcu_iw_gp_seq = rnp->gp_seq - 1;
trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
@@ -3359,7 +3361,7 @@ void rcu_cpu_starting(unsigned int cpu)
rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
- rcu_disable_tick_upon_qs(rdp);
+ rcu_disable_urgency_upon_qs(rdp);
/* Report QS -after- changing ->qsmaskinitnext! */
rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
} else {