Re: [PATCH 01/11] rcu: avoid leaking exp_deferred_qs into next GP
From: Paul E. McKenney
Date: Thu Oct 31 2019 - 15:00:19 EST
On Fri, Nov 01, 2019 at 02:19:13AM +0800, Lai Jiangshan wrote:
>
>
> On 2019/10/31 9:43 äå, Paul E. McKenney wrote:
> > On Thu, Oct 31, 2019 at 10:07:56AM +0000, Lai Jiangshan wrote:
> > > If exp_deferred_qs is incorrectly set and leaked to the next
> > > exp GP, it may cause the next GP to be incorrectly prematurely
> > > completed.
> >
> > Could you please provide the sequence of events leading to a such a
> > failure?
>
> I just felt nervous with "leaking" exp_deferred_qs.
> I didn't careful consider the sequence of events.
>
> Now it proves that I must have misunderstood the exp_deferred_qs.
> So call "leaking" is wrong concept, preempt_disable()
> is considered as rcu_read_lock() and exp_deferred_qs
> needs to be set.
Thank you for checking, and yes, this code is a bit subtle. So good
on you for digging into it!
Thanx, Paul
> Thanks
> Lai
>
> ============don't need to read:
>
> read_read_lock()
> // other cpu start exp GP_A
> preempt_schedule() // queue itself
> read_read_unlock() //report qs, other cpu is sending ipi to me
> preempt_disable
> rcu_exp_handler() interrupt for GP_A and leave a exp_deferred_qs
> // exp GP_A finished
> ---------------above is one possible way to leave a exp_deferred_qs
> preempt_enable()
> interrupt before preempt_schedule()
> read_read_lock()
> read_read_unlock()
> NESTED interrupt when nagative rcu_read_lock_nesting
> read_read_lock()
> // other cpu start exp GP_B
> NESTED interrupt for rcu_flavor_sched_clock_irq()
> report exq qs since rcu_read_lock_nesting <0 and \
> exp_deferred_qs is true
> // exp GP_B complete
> read_read_unlock()
>
> This plausible sequence relies on NESTED interrupt too,
> and can be avoided by patch2 if NESTED interrupt were allowed.
>
> >
> > Also, did you provoke such a failure in testing? If so, an upgrade
> > to rcutorture would be good, so please tell me what you did to make
> > the failure happen.
> >
> > I do like the reduction in state space, but I am a bit concerned about
> > the potential increase in contention on rnp->lock. Thoughts?
> >
> > Thanx, Paul
> >
> > > Signed-off-by: Lai Jiangshan <laijs@xxxxxxxxxxxxxxxxx>
> > > ---
> > > kernel/rcu/tree_exp.h | 23 ++++++++++++++---------
> > > 1 file changed, 14 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index a0e1e51c51c2..6dec21909b30 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -603,6 +603,18 @@ static void rcu_exp_handler(void *unused)
> > > struct rcu_node *rnp = rdp->mynode;
> > > struct task_struct *t = current;
> > > + /*
> > > + * Note that there is a large group of race conditions that
> > > + * can have caused this quiescent state to already have been
> > > + * reported, so we really do need to check ->expmask first.
> > > + */
> > > + raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > + if (!(rnp->expmask & rdp->grpmask)) {
> > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > + return;
> > > + }
> > > + raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > +
> > > /*
> > > * First, the common case of not being in an RCU read-side
> > > * critical section. If also enabled or idle, immediately
> > > @@ -628,17 +640,10 @@ static void rcu_exp_handler(void *unused)
> > > * a future context switch. Either way, if the expedited
> > > * grace period is still waiting on this CPU, set ->deferred_qs
> > > * so that the eventual quiescent state will be reported.
> > > - * Note that there is a large group of race conditions that
> > > - * can have caused this quiescent state to already have been
> > > - * reported, so we really do need to check ->expmask.
> > > */
> > > if (t->rcu_read_lock_nesting > 0) {
> > > - raw_spin_lock_irqsave_rcu_node(rnp, flags);
> > > - if (rnp->expmask & rdp->grpmask) {
> > > - rdp->exp_deferred_qs = true;
> > > - t->rcu_read_unlock_special.b.exp_hint = true;
> > > - }
> > > - raw_spin_unlock_irqrestore_rcu_node(rnp, flags);
> > > + rdp->exp_deferred_qs = true;
> > > + WRITE_ONCE(t->rcu_read_unlock_special.b.exp_hint, true);
> > > return;
> > > }
> > > --
> > > 2.20.1
> > >