Re: [PATCH] rcu: Avoid to modify mask_ofl_ipi in sync_rcu_exp_select_node_cpus()

From: Joel Fernandes
Date: Tue Oct 08 2019 - 13:01:26 EST


On Tue, Oct 08, 2019 at 06:35:45PM +0200, Marco Elver wrote:
> On Tue, 8 Oct 2019 at 18:30, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> >
> > On Tue, Oct 08, 2019 at 01:01:40PM +0800, Boqun Feng wrote:
> > > "mask_ofl_ipi" is used for iterate CPUs which IPIs are needed to send
> > > to, however in the IPI sending loop, "mask_ofl_ipi" along with another
> > > variable "mask_ofl_test" might also get modified to record which CPU's
> > > quiesent state can be reported by sync_rcu_exp_select_node_cpus(). Two
> > > variables seems to be redundant for such a propose, so this patch clean
> > > things a little by solely using "mask_ofl_test" for recording and
> > > "mask_ofl_ipi" for iteration. This would improve the readibility of the
> > > IPI sending loop in sync_rcu_exp_select_node_cpus().
> > >
> > > Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
> > > ---
> >
> > Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> >
> > thanks,
> >
> > - Joel
>
> Acked-by: Marco Elver <elver@xxxxxxxxxx>
>
> If this is the official patch for the fix to the KCSAN reported
> data-race, it'd be great to include the tag:
> Reported-by: syzbot+134336b86f728d6e55a0@xxxxxxxxxxxxxxxxxxxxxxxxx
> so the bot knows this was fixed.

It is just an optimization that got triggerred due to debugging of the
reported issue but does (should) not fix the issue.

Boqun, are you going to be posting another patch which just uses mask_ofl_ipi
in the for_each(..) loop? (without using _snap) as Paul suggested?

Paul mentioned other places where rnp->expmask is locklessly accessed so I
think that may be fixed separately (such as the stall-warning code). Paul,
were you planning on fixing all such accesses together (other than this code)
or should I look into it more? I guess for the stall case, KCSAN would have
to trigger stalls to see those issues.

thanks,

- Joel

>
> Thanks!
> -- Marco
>
> > > kernel/rcu/tree_exp.h | 13 ++++++-------
> > > 1 file changed, 6 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index 69c5aa64fcfd..212470018752 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -387,10 +387,10 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > }
> > > ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> > > put_cpu();
> > > - if (!ret) {
> > > - mask_ofl_ipi &= ~mask;
> > > + /* The CPU responses the IPI, and will report QS itself */
> > > + if (!ret)
> > > continue;
> > > - }
> > > +
> > > /* Failed, raced with CPU hotplug operation. */
> > > raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > if ((rnp->qsmaskinitnext & mask) &&
> > > @@ -401,13 +401,12 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > schedule_timeout_uninterruptible(1);
> > > goto retry_ipi;
> > > }
> > > - /* CPU really is offline, so we can ignore it. */
> > > - if (!(rnp->expmask & mask))
> > > - mask_ofl_ipi &= ~mask;
> > > + /* CPU really is offline, and we need its QS to pass GP. */
> > > + if (rnp->expmask & mask)
> > > + mask_ofl_test |= mask;
> > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > }
> > > /* Report quiescent states for those that went offline. */
> > > - mask_ofl_test |= mask_ofl_ipi;
> > > if (mask_ofl_test)
> > > rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
> > > }
> > > --
> > > 2.23.0
> > >