Re: [syzbot] KCSAN: data-race in call_rcu / rcu_gp_kthread

From: Paul E. McKenney
Date: Tue Jul 20 2021 - 12:40:31 EST


On Tue, Jul 20, 2021 at 04:10:10PM +0200, Marco Elver wrote:
> On Tue, 20 Jul 2021 at 15:18, Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
> [...]
> > Good catch! And yes, this would be hard to reproduce.
> >
> > How about as shown below?
>
> Acked-by: Marco Elver <elver@xxxxxxxxxx>

I will apply on the next rebase, thank you!

> I was merely a little surprised syzbot was able to exercise RCU in a
> way that resulted in a data race your torture runs hadn't found yet
> (or perhaps it did and missed?).

My KCSAN runs are necessarily quite short because I do a normal, KASAN,
and KCSAN variant of each scenario of each torture test, with the
constraint that it all run overnight.

So there are probably more to find. ;-)

Thanx, Paul

> Thanks,
> -- Marco
>
> > Thanx, Paul
> >
> > ------------------------------------------------------------------------
> >
> > commit 43e0f01f3b6f510dbe31d02a8f4c909c45deff04
> > Author: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > Date: Tue Jul 20 06:16:27 2021 -0700
> >
> > rcu: Mark accesses to rcu_state.n_force_qs
> >
> > This commit marks accesses to the rcu_state.n_force_qs. These data
> > races are hard to make happen, but syzkaller was equal to the task.
> >
> > Reported-by: syzbot+e08a83a1940ec3846cd5@xxxxxxxxxxxxxxxxxxxxxxxxx
> > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> >
> > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > index a7379c44a2366..245bca7cdf6ee 100644
> > --- a/kernel/rcu/tree.c
> > +++ b/kernel/rcu/tree.c
> > @@ -1913,7 +1913,7 @@ static void rcu_gp_fqs(bool first_time)
> > struct rcu_node *rnp = rcu_get_root();
> >
> > WRITE_ONCE(rcu_state.gp_activity, jiffies);
> > - rcu_state.n_force_qs++;
> > + WRITE_ONCE(rcu_state.n_force_qs, rcu_state.n_force_qs + 1);
> > if (first_time) {
> > /* Collect dyntick-idle snapshots. */
> > force_qs_rnp(dyntick_save_progress_counter);
> > @@ -2556,7 +2556,7 @@ static void rcu_do_batch(struct rcu_data *rdp)
> > /* Reset ->qlen_last_fqs_check trigger if enough CBs have drained. */
> > if (count == 0 && rdp->qlen_last_fqs_check != 0) {
> > rdp->qlen_last_fqs_check = 0;
> > - rdp->n_force_qs_snap = rcu_state.n_force_qs;
> > + rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs);
> > } else if (count < rdp->qlen_last_fqs_check - qhimark)
> > rdp->qlen_last_fqs_check = count;
> >
> > @@ -2904,10 +2904,10 @@ static void __call_rcu_core(struct rcu_data *rdp, struct rcu_head *head,
> > } else {
> > /* Give the grace period a kick. */
> > rdp->blimit = DEFAULT_MAX_RCU_BLIMIT;
> > - if (rcu_state.n_force_qs == rdp->n_force_qs_snap &&
> > + if (READ_ONCE(rcu_state.n_force_qs) == rdp->n_force_qs_snap &&
> > rcu_segcblist_first_pend_cb(&rdp->cblist) != head)
> > rcu_force_quiescent_state();
> > - rdp->n_force_qs_snap = rcu_state.n_force_qs;
> > + rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs);
> > rdp->qlen_last_fqs_check = rcu_segcblist_n_cbs(&rdp->cblist);
> > }
> > }
> > @@ -4134,7 +4134,7 @@ int rcutree_prepare_cpu(unsigned int cpu)
> > /* Set up local state, ensuring consistent view of global state. */
> > raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > rdp->qlen_last_fqs_check = 0;
> > - rdp->n_force_qs_snap = rcu_state.n_force_qs;
> > + rdp->n_force_qs_snap = READ_ONCE(rcu_state.n_force_qs);
> > rdp->blimit = blimit;
> > rdp->dynticks_nesting = 1; /* CPU not up, no tearing. */
> > rcu_dynticks_eqs_online();