Re: [PATCH] rcu: Avoid to modify mask_ofl_ipi in sync_rcu_exp_select_node_cpus()

From: Boqun Feng
Date: Tue Oct 08 2019 - 22:20:38 EST


On Tue, Oct 08, 2019 at 01:01:21PM -0400, Joel Fernandes wrote:
> On Tue, Oct 08, 2019 at 06:35:45PM +0200, Marco Elver wrote:
> > On Tue, 8 Oct 2019 at 18:30, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Tue, Oct 08, 2019 at 01:01:40PM +0800, Boqun Feng wrote:
> > > > "mask_ofl_ipi" is used for iterate CPUs which IPIs are needed to send
> > > > to, however in the IPI sending loop, "mask_ofl_ipi" along with another
> > > > variable "mask_ofl_test" might also get modified to record which CPU's
> > > > quiesent state can be reported by sync_rcu_exp_select_node_cpus(). Two
> > > > variables seems to be redundant for such a propose, so this patch clean
> > > > things a little by solely using "mask_ofl_test" for recording and
> > > > "mask_ofl_ipi" for iteration. This would improve the readibility of the
> > > > IPI sending loop in sync_rcu_exp_select_node_cpus().
> > > >
> > > > Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
> > > > ---
> > >
> > > Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > >
> > > thanks,
> > >
> > > - Joel
> >
> > Acked-by: Marco Elver <elver@xxxxxxxxxx>
> >

Thank you both!

> > If this is the official patch for the fix to the KCSAN reported
> > data-race, it'd be great to include the tag:
> > Reported-by: syzbot+134336b86f728d6e55a0@xxxxxxxxxxxxxxxxxxxxxxxxx
> > so the bot knows this was fixed.
>
> It is just an optimization that got triggerred due to debugging of the
> reported issue but does (should) not fix the issue.
>

Right.

> Boqun, are you going to be posting another patch which just uses mask_ofl_ipi
> in the for_each(..) loop? (without using _snap) as Paul suggested?
>

IIUC, Paul already has this fix along with other ->expmask queued in his
dev branch:

https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/commit/?h=dev&id=4e4fefe0630dcf7415d62e6d9171c8f209444376

, and with the proper "Reported-by" tag to give syzbot credit.

Regards,
Boqun

> Paul mentioned other places where rnp->expmask is locklessly accessed so I
> think that may be fixed separately (such as the stall-warning code). Paul,
> were you planning on fixing all such accesses together (other than this code)
> or should I look into it more? I guess for the stall case, KCSAN would have
> to trigger stalls to see those issues.
>
> thanks,
>
> - Joel
>
> >
> > Thanks!
> > -- Marco
> >
> > > > kernel/rcu/tree_exp.h | 13 ++++++-------
> > > > 1 file changed, 6 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > > index 69c5aa64fcfd..212470018752 100644
> > > > --- a/kernel/rcu/tree_exp.h
> > > > +++ b/kernel/rcu/tree_exp.h
> > > > @@ -387,10 +387,10 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > > }
> > > > ret = smp_call_function_single(cpu, rcu_exp_handler, NULL, 0);
> > > > put_cpu();
> > > > - if (!ret) {
> > > > - mask_ofl_ipi &= ~mask;
> > > > + /* The CPU responses the IPI, and will report QS itself */
> > > > + if (!ret)
> > > > continue;
> > > > - }
> > > > +
> > > > /* Failed, raced with CPU hotplug operation. */
> > > > raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > > if ((rnp->qsmaskinitnext & mask) &&
> > > > @@ -401,13 +401,12 @@ static void sync_rcu_exp_select_node_cpus(struct work_struct *wp)
> > > > schedule_timeout_uninterruptible(1);
> > > > goto retry_ipi;
> > > > }
> > > > - /* CPU really is offline, so we can ignore it. */
> > > > - if (!(rnp->expmask & mask))
> > > > - mask_ofl_ipi &= ~mask;
> > > > + /* CPU really is offline, and we need its QS to pass GP. */
> > > > + if (rnp->expmask & mask)
> > > > + mask_ofl_test |= mask;
> > > > raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > > }
> > > > /* Report quiescent states for those that went offline. */
> > > > - mask_ofl_test |= mask_ofl_ipi;
> > > > if (mask_ofl_test)
> > > > rcu_report_exp_cpu_mult(rnp, mask_ofl_test, false);
> > > > }
> > > > --
> > > > 2.23.0
> > > >