Re: [PATCH RFC kenrel/rcu] Eliminate BUG_ON() for sync.c

From: Oleg Nesterov
Date: Wed Oct 31 2018 - 13:26:11 EST


On 10/30, Paul E. McKenney wrote:
>
> On Mon, Oct 22, 2018 at 06:14:40PM +0200, Oleg Nesterov wrote:
> > > > ----------------------------------------------------------------------------
> > > > Damn.
> > > >
> > > > This suddenly reminds me that I rewrote this code completely, and you even
> > > > reviewed the new implementation and (iirc) acked it!
> > > >
> > > > However, I failed to force myself to rewrite the comments, and that is why
> > > > I didn't send the "official" patch :/
> > > >
> > > > May be some time...
> > >
> > > Could you please point me at the last email thread? Yes, I should be
> > > able to find it, but I would probably get the wrong one. :-/
> >
> > probably this one,
> >
> > [PATCH] rcu_sync: simplify the state machine, introduce __rcu_sync_enter()
> > https://lkml.org/lkml/2016/7/16/150
> >
> > but I am not sure, will recheck tomorrow.
>
> Just following up... Here is what I currently have.

Hmm. Are you sure you replied to the correct message? ;)

the patch below looks absolutely unrelated...

>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> commit 1c1d315dfb7049d0233b89948a3fbcb61ea15d26
> Author: Dennis Krein <Dennis.Krein@xxxxxxxxxx>
> Date: Fri Oct 26 07:38:24 2018 -0700
>
> srcu: Lock srcu_data structure in srcu_gp_start()
>
> The srcu_gp_start() function is called with the srcu_struct structure's
> ->lock held, but not with the srcu_data structure's ->lock. This is
> problematic because this function accesses and updates the srcu_data
> structure's ->srcu_cblist, which is protected by that lock. Failing to
> hold this lock can result in corruption of the SRCU callback lists,
> which in turn can result in arbitrarily bad results.
>
> This commit therefore makes srcu_gp_start() acquire the srcu_data
> structure's ->lock across the calls to rcu_segcblist_advance() and
> rcu_segcblist_accelerate(), thus preventing this corruption.
>
> Reported-by: Bart Van Assche <bvanassche@xxxxxxx>
> Reported-by: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> Signed-off-by: Dennis Krein <Dennis.Krein@xxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxx>
>
> diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
> index 60f3236beaf7..697a2d7e8e8a 100644
> --- a/kernel/rcu/srcutree.c
> +++ b/kernel/rcu/srcutree.c
> @@ -451,10 +451,12 @@ static void srcu_gp_start(struct srcu_struct *sp)
>
> lockdep_assert_held(&ACCESS_PRIVATE(sp, lock));
> WARN_ON_ONCE(ULONG_CMP_GE(sp->srcu_gp_seq, sp->srcu_gp_seq_needed));
> + spin_lock_rcu_node(sdp); /* Interrupts already disabled. */
> rcu_segcblist_advance(&sdp->srcu_cblist,
> rcu_seq_current(&sp->srcu_gp_seq));
> (void)rcu_segcblist_accelerate(&sdp->srcu_cblist,
> rcu_seq_snap(&sp->srcu_gp_seq));
> + spin_unlock_rcu_node(sdp); /* Interrupts remain disabled. */
> smp_mb(); /* Order prior store to ->srcu_gp_seq_needed vs. GP start. */
> rcu_seq_start(&sp->srcu_gp_seq);
> state = rcu_seq_state(READ_ONCE(sp->srcu_gp_seq));
>