Re: [PATCH 01/10] rcu: Directly lock rdp->nocb_lock on nocb code entrypoints
From: Joel Fernandes
Date: Tue May 26 2020 - 11:21:41 EST
Hi Paul,
On Fri, May 22, 2020 at 10:57:39AM -0700, Paul E. McKenney wrote:
> On Wed, May 20, 2020 at 08:29:49AM -0400, Joel Fernandes wrote:
> > On Wed, May 13, 2020 at 06:47:05PM +0200, Frederic Weisbecker wrote:
> > > Pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > > timers) can unconditionally lock rdp->nocb_lock as they always execute
> > > in the context of an offloaded rdp.
> > >
> > > This also prepare for toggling CPUs to/from callback's offloaded mode
> > > where the offloaded state will possibly change when rdp->nocb_lock
> > > isn't taken. We'll still want the entrypoints to lock the rdp in any
> > > case.
> >
> > Suggested rewrite for change log:
> > <wordsmithing>
> > Make pure NOCB code entrypoints (nocb_cb kthread, nocb_gp kthread, nocb
> > timers) unconditionally lock rdp->nocb_lock as they always execute in the
> > context of an offloaded rdp.
> >
> > This prepares for future toggling of CPUs to/from callback's offloaded mode
> > where the offloaded state can change when rdp->nocb_lock is not held. We'll
> > still want the entrypoints to lock the rdp in any case.
> > </wordsmithing>
> >
> > Also, can we inline rcu_nocb_lock_irqsave() into
> > do_nocb_deferred_wakeup_common() since that's the only user, and then delete
> > rcu_nocb_lock_irqsave() and the corresponding unlock? That would also remove
> > confusion about which API to use for nocb locking (i.e. whether to directly
> > acquire lock or call rcu_nocb_lock_irqsave()).
> >
> > Reviewed-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
>
> Thank you for looking this over, Joel!
>
> Is it feasible to make rcu_nocb_lock*() and rcu_nocb_unlock*() "do the
> right thing", even when things are changing?
One way to prevent things from changing could be to employ Steven's
poor-man's RCU to mediate the "changing state" itself assuming the switch
from the fast-path to the slow-path does not happen often. :) So whichever
path is changing the state needs to wait for that poor-man's GP.
Any case, are you concerned with the performance issues with the
unconditional locking and that's why you suggest still keeping it conditional?
Also, coming back to my point of inline the helper into the last caller -
do_nocb_deferred_wakeup_common(). I think since we don't need to check if the
rdp is offloaded and do any conditional locking. The timer can be called only
with offloaded rdp. So we can directly do the unconditional spinlock instead
of using the rcu_nocb_lock helper there.
> would prevent any number of "interesting" copy-pasta and "just now became
> common code" bugs down the road. And because irqs are disabled while
> holding the lock, it should be possible to keep state on a per-CPU basis.
Agreed, that would be nice. Though if we could keep simple, that'd be nice
too.
> The ugliest scenario is callback adoption, where there are two ->cblist
> structures in need of being locked. In that case, changes are excluded
> (because that is in CPU hotplug code), but is it possible to take
> advantage of that reasonably?
Could you describe this a bit more? Thanks.
> Maybe these changes are the best we can do, but it would be good to
> if the same primitive locked a ->cblist regardless of context.
Here you are comparing 2 primitives. Do you mean that just IRQs being
disabled is another primitive, and rcu_nocb_lock is another one?
BTW, I'm really itching to give it a try to make the scheduler more deadlock
resilient (that is, if the scheduler wake up path detects a deadlock, then it
defers the wake up using timers, or irq_work on its own instead of passing
the burden of doing so to the callers). Thoughts?
thanks,
- Joel
> Can that be made to work reasonably?
>
> Thanx, Paul
>
> > thanks,
> >
> > - Joel
> >
> >
> > >
> > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > Cc: Josh Triplett <josh@xxxxxxxxxxxxxxxx>
> > > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
> > > Cc: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
> > > Cc: Lai Jiangshan <jiangshanlai@xxxxxxxxx>
> > > Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
> > > ---
> > > kernel/rcu/tree_plugin.h | 14 +++++++-------
> > > 1 file changed, 7 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
> > > index 097635c41135..523570469864 100644
> > > --- a/kernel/rcu/tree_plugin.h
> > > +++ b/kernel/rcu/tree_plugin.h
> > > @@ -1909,7 +1909,7 @@ static void do_nocb_bypass_wakeup_timer(struct timer_list *t)
> > > struct rcu_data *rdp = from_timer(rdp, t, nocb_bypass_timer);
> > >
> > > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Timer"));
> > > - rcu_nocb_lock_irqsave(rdp, flags);
> > > + raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > smp_mb__after_spinlock(); /* Timer expire before wakeup. */
> > > __call_rcu_nocb_wake(rdp, true, flags);
> > > }
> > > @@ -1942,7 +1942,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > */
> > > for (rdp = my_rdp; rdp; rdp = rdp->nocb_next_cb_rdp) {
> > > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("Check"));
> > > - rcu_nocb_lock_irqsave(rdp, flags);
> > > + raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > if (bypass_ncbs &&
> > > (time_after(j, READ_ONCE(rdp->nocb_bypass_first) + 1) ||
> > > @@ -1951,7 +1951,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > (void)rcu_nocb_try_flush_bypass(rdp, j);
> > > bypass_ncbs = rcu_cblist_n_cbs(&rdp->nocb_bypass);
> > > } else if (!bypass_ncbs && rcu_segcblist_empty(&rdp->cblist)) {
> > > - rcu_nocb_unlock_irqrestore(rdp, flags);
> > > + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > continue; /* No callbacks here, try next. */
> > > }
> > > if (bypass_ncbs) {
> > > @@ -1996,7 +1996,7 @@ static void nocb_gp_wait(struct rcu_data *my_rdp)
> > > } else {
> > > needwake = false;
> > > }
> > > - rcu_nocb_unlock_irqrestore(rdp, flags);
> > > + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > if (needwake) {
> > > swake_up_one(&rdp->nocb_cb_wq);
> > > gotcbs = true;
> > > @@ -2084,7 +2084,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > rcu_do_batch(rdp);
> > > local_bh_enable();
> > > lockdep_assert_irqs_enabled();
> > > - rcu_nocb_lock_irqsave(rdp, flags);
> > > + raw_spin_lock_irqsave(&rdp->nocb_lock, flags);
> > > if (rcu_segcblist_nextgp(&rdp->cblist, &cur_gp_seq) &&
> > > rcu_seq_done(&rnp->gp_seq, cur_gp_seq) &&
> > > raw_spin_trylock_rcu_node(rnp)) { /* irqs already disabled. */
> > > @@ -2092,7 +2092,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > > raw_spin_unlock_rcu_node(rnp); /* irqs remain disabled. */
> > > }
> > > if (rcu_segcblist_ready_cbs(&rdp->cblist)) {
> > > - rcu_nocb_unlock_irqrestore(rdp, flags);
> > > + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > if (needwake_gp)
> > > rcu_gp_kthread_wake();
> > > return;
> > > @@ -2100,7 +2100,7 @@ static void nocb_cb_wait(struct rcu_data *rdp)
> > >
> > > trace_rcu_nocb_wake(rcu_state.name, rdp->cpu, TPS("CBSleep"));
> > > WRITE_ONCE(rdp->nocb_cb_sleep, true);
> > > - rcu_nocb_unlock_irqrestore(rdp, flags);
> > > + raw_spin_unlock_irqrestore(&rdp->nocb_lock, flags);
> > > if (needwake_gp)
> > > rcu_gp_kthread_wake();
> > > swait_event_interruptible_exclusive(rdp->nocb_cb_wq,
> > > --
> > > 2.25.0
> > >