Re: [PATCH -rcu dev 1/2] Revert b8c17e6664c4 ("rcu: Maintain special bits at bottom of ->dynticks counter")

From: Paul E. McKenney
Date: Thu Sep 05 2019 - 12:44:28 EST


On Thu, Sep 05, 2019 at 11:36:20AM -0400, Joel Fernandes wrote:
> On Wed, Sep 04, 2019 at 04:13:08PM -0700, Paul E. McKenney wrote:
> > On Wed, Sep 04, 2019 at 09:54:20AM -0400, Joel Fernandes wrote:
> > > On Wed, Sep 04, 2019 at 03:12:10AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Sep 04, 2019 at 12:59:10AM -0400, Joel Fernandes wrote:
> > > > > On Tue, Sep 03, 2019 at 01:02:49PM -0700, Paul E. McKenney wrote:
> >
> > [ . . . ]
> >
> > > > If this task gets delayed betweentimes, rcu_implicit_dynticks_qs() would
> > > > fail to set .rcu_need_heavy_qs because it saw it already being set,
> > > > even though the corresponding ->dynticks update had already happened.
> > > > (It might be a new grace period, given that the old grace period might
> > > > have ended courtesy of the atomic_add_return().)
> > >
> > > Makes sense and I agree.
> > >
> > > Also, I would really appreciate if you can correct the nits in the above
> > > patch we're reviewing, and apply them (if you can).
> > > I think, there are only 2 changes left:
> > > - rename special to seq.
> > > - reorder the rcu_need_heavy_qs write.
> > >
> > > On a related point, when I was working on the NOHZ_FULL testing I noticed a
> > > weird issue where rcu_urgent_qs was reset but rcu_need_heavy_qs was still
> > > set indefinitely. I am a bit afraid our hints are not being cleared
> > > appropriately and I believe I fixed a similar issue a few months ago. I
> > > would rather have them cleared once they are no longer needed. What do you
> > > think about the below patch? I did not submit it yet because I was working
> > > on other patches.
> > >
> > > ---8<-----------------------
> > >
> > > From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> > > Subject: [RFC] rcu/tree: Reset CPU hints when reporting a quiescent state
> > >
> > > While tracing, I am seeing cases where need_heavy_qs is still set even
> > > though urgent_qs was cleared, after a quiescent state is reported. One
> > > such case is when the softirq reports that a CPU has passed quiescent
> > > state.
> > >
> > > Previously in 671a63517cf9 ("rcu: Avoid unnecessary softirq when system
> > > is idle"), I had fixed a bug where core_needs_qs was not being cleared.
> > > I worry we keep running into similar situations. Let us just add a
> > > function to clear hints and call it from all relevant places to make the
> > > code more robust and avoid such stale hints which could in theory at
> > > least, cause false hints after the quiescent state was already reported.
> > >
> > > Tested overnight with rcutorture running for 60 minutes on all
> > > configurations of RCU.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > > ---
> > > kernel/rcu/tree.c | 17 ++++++++++++++++-
> > > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > Excellent point! But how about if we combine it with the existing
> > disabling of the scheduler tick, perhaps something like the following?
> >
> > Note that the FQS clearing can come from some other CPU, hence the added
> > {READ,WRITE}_ONCE() calls. The call is moved down in rcu_report_qs_rdp()
> > because something would have had to clear the bit to prevent execution
> > from getting there, and I believe that the other bit-clearing events
> > have calls to rcu_disable_urgency_upon_qs(). (But I easily could have
> > missed something!)
>
> Is there any harm just clearing it earlier in rcu_report_qs_rdp()? If no,
> then let us just play it safe and do it that way (clear earlier in
> rcu_report_qs_rdp())?

Maybe...

But given that missing a path doesn't cause a major failure (too-short
grace period, for example), I am more inclined to find the paths and
fix them as needed. Especially given that my ignorance of any path to
a quiescent state likely hides a serious bug.

> > I am OK leaving RCU urgency set on offline CPUs, hence clearing things
> > at online time.
>
> Got it, probably this point can be added to the commit message.
>
> Added more comments below but otherwise it looks good to me:
>
> > ------------------------------------------------------------------------
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index 68ebf0eb64c8..2b74b6c94086 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -827,7 +827,7 @@ static __always_inline void rcu_nmi_enter_common(bool irq)
> > incby = 1;
> > } else if (tick_nohz_full_cpu(rdp->cpu) &&
> > rdp->dynticks_nmi_nesting == DYNTICK_IRQ_NONIDLE &&
> > - rdp->rcu_urgent_qs && !rdp->rcu_forced_tick) {
> > + READ_ONCE(rdp->rcu_urgent_qs) && !rdp->rcu_forced_tick) {
> > rdp->rcu_forced_tick = true;
> > tick_dep_set_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > }
> > @@ -892,11 +892,15 @@ void rcu_irq_enter_irqson(void)
> > }
> >
> > /*
> > - * If the scheduler-clock interrupt was enabled on a nohz_full CPU
> > - * in order to get to a quiescent state, disable it.
> > + * If any sort of urgency was applied to the current CPU (for example,
> > + * the scheduler-clock interrupt was enabled on a nohz_full CPU) in order
> > + * to get to a quiescent state, disable it.
> > */
> > -void rcu_disable_tick_upon_qs(struct rcu_data *rdp)
> > +void rcu_disable_urgency_upon_qs(struct rcu_data *rdp)
> > {
> > + WRITE_ONCE(rdp->core_needs_qs, false);
> > + WRITE_ONCE(rdp->rcu_urgent_qs, false);
> > + WRITE_ONCE(rdp->rcu_need_heavy_qs, false);
>
> Better to put a comment here saying _ONCE is needed to avoid data-races with
> the FQS loop? Just so if anyone thinks why we are using _ONCE().

Good point. I added a "// WRITE_ONCE() for FQS".

> And I am guessing the __this_cpu_read(rcu_data.core_needs_qs) in
> rcu_flavor_sched_clock_irq() implies READ_ONCE() so no need READ_ONCE()
> there right?

Assembly in x86. Not so much on other architectures, though. ;-)
See raw_cpu_generic_read().

> > if (tick_nohz_full_cpu(rdp->cpu) && rdp->rcu_forced_tick) {
> > tick_dep_clear_cpu(rdp->cpu, TICK_DEP_BIT_RCU);
> > rdp->rcu_forced_tick = false;
> > @@ -1417,7 +1421,7 @@ static bool __note_gp_changes(struct rcu_node *rnp, struct rcu_data *rdp)
> > trace_rcu_grace_period(rcu_state.name, rnp->gp_seq, TPS("cpustart"));
> > need_gp = !!(rnp->qsmask & rdp->grpmask);
> > rdp->cpu_no_qs.b.norm = need_gp;
> > - rdp->core_needs_qs = need_gp;
> > + WRITE_ONCE(rdp->core_needs_qs, need_gp);
> > zero_cpu_stall_ticks(rdp);
> > }
> > rdp->gp_seq = rnp->gp_seq; /* Remember new grace-period state. */
> > @@ -1987,7 +1991,6 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
> > return;
> > }
> > mask = rdp->grpmask;
> > - rdp->core_needs_qs = false;
> > if ((rnp->qsmask & mask) == 0) {
> > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > } else {
> > @@ -1998,7 +2001,7 @@ rcu_report_qs_rdp(int cpu, struct rcu_data *rdp)
> > if (!offloaded)
> > needwake = rcu_accelerate_cbs(rnp, rdp);
> >
> > - rcu_disable_tick_upon_qs(rdp);
> > + rcu_disable_urgency_upon_qs(rdp);
> > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > /* ^^^ Released rnp->lock */
> > if (needwake)
> > @@ -2022,7 +2025,7 @@ rcu_check_quiescent_state(struct rcu_data *rdp)
> > * Does this CPU still need to do its part for current grace period?
> > * If no, return and let the other CPUs do their part as well.
> > */
> > - if (!rdp->core_needs_qs)
> > + if (!READ_ONCE(rdp->core_needs_qs))
> > return;
> >
> > /*
> > @@ -2316,7 +2319,7 @@ static void force_qs_rnp(int (*f)(struct rcu_data *rdp))
> > rdp = per_cpu_ptr(&rcu_data, cpu);
> > if (f(rdp)) {
> > mask |= bit;
> > - rcu_disable_tick_upon_qs(rdp);
> > + rcu_disable_urgency_upon_qs(rdp);
> > }
> > }
> > }
> > @@ -3004,7 +3007,7 @@ static int rcu_pending(void)
> > return 0;
> >
> > /* Is the RCU core waiting for a quiescent state from this CPU? */
> > - if (rdp->core_needs_qs && !rdp->cpu_no_qs.b.norm)
> > + if (READ_ONCE(rdp->core_needs_qs) && !rdp->cpu_no_qs.b.norm)
> > return 1;
> >
> > /* Does this CPU have callbacks ready to invoke? */
> > @@ -3244,7 +3247,6 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > rdp->gp_seq = rnp->gp_seq;
> > rdp->gp_seq_needed = rnp->gp_seq;
> > rdp->cpu_no_qs.b.norm = true;
> > - rdp->core_needs_qs = false;
>
> How about calling the new hint-clearing function here as well? Just for
> robustness and consistency purposes?

This and the next function are both called during a CPU-hotplug online
operation, so there is little robustness or consistency to be had by
doing it twice.

Thanx, Paul

> thanks,
>
> - Joel
>
> > rdp->rcu_iw_pending = false;
> > rdp->rcu_iw_gp_seq = rnp->gp_seq - 1;
> > trace_rcu_grace_period(rcu_state.name, rdp->gp_seq, TPS("cpuonl"));
> > @@ -3359,7 +3361,7 @@ void rcu_cpu_starting(unsigned int cpu)
> > rdp->rcu_onl_gp_seq = READ_ONCE(rcu_state.gp_seq);
> > rdp->rcu_onl_gp_flags = READ_ONCE(rcu_state.gp_flags);
> > if (rnp->qsmask & mask) { /* RCU waiting on incoming CPU? */
> > - rcu_disable_tick_upon_qs(rdp);
> > + rcu_disable_urgency_upon_qs(rdp);
> > /* Report QS -after- changing ->qsmaskinitnext! */
> > rcu_report_qs_rnp(mask, rnp, rnp->gp_seq, flags);
> > } else {