Re: [PATCH rcu 3/9] rcu/tree: Reduce wake up for synchronize_rcu() common case

From: Paul E. McKenney
Date: Thu Jun 06 2024 - 14:12:09 EST


On Thu, Jun 06, 2024 at 11:28:07AM +0530, Neeraj upadhyay wrote:
> On Wed, Jun 5, 2024 at 10:05 PM Frederic Weisbecker <frederic@xxxxxxxxxx> wrote:
> >
> > Le Tue, Jun 04, 2024 at 03:23:49PM -0700, Paul E. McKenney a écrit :
> > > From: "Joel Fernandes (Google)" <joel@xxxxxxxxxxxxxxxxx>
> > >
> > > In the synchronize_rcu() common case, we will have less than
> > > SR_MAX_USERS_WAKE_FROM_GP number of users per GP. Waking up the kworker
> > > is pointless just to free the last injected wait head since at that point,
> > > all the users have already been awakened.
> > >
> > > Introduce a new counter to track this and prevent the wakeup in the
> > > common case.
> > >
> > > Signed-off-by: Joel Fernandes (Google) <joel@xxxxxxxxxxxxxxxxx>
> > > Reviewed-by: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx>
> > > Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> > > ---
> > > kernel/rcu/tree.c | 35 ++++++++++++++++++++++++++++++-----
> > > kernel/rcu/tree.h | 1 +
> > > 2 files changed, 31 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > index 6ba36d9c09bde..2fe08e6186b4d 100644
> > > --- a/kernel/rcu/tree.c
> > > +++ b/kernel/rcu/tree.c
> > > @@ -96,6 +96,7 @@ static struct rcu_state rcu_state = {
> > > .ofl_lock = __ARCH_SPIN_LOCK_UNLOCKED,
> > > .srs_cleanup_work = __WORK_INITIALIZER(rcu_state.srs_cleanup_work,
> > > rcu_sr_normal_gp_cleanup_work),
> > > + .srs_cleanups_pending = ATOMIC_INIT(0),
> > > };
> > >
> > > /* Dump rcu_node combining tree at boot to verify correct setup. */
> > > @@ -1633,8 +1634,11 @@ static void rcu_sr_normal_gp_cleanup_work(struct work_struct *work)
> > > * the done tail list manipulations are protected here.
> > > */
> > > done = smp_load_acquire(&rcu_state.srs_done_tail);
> > > - if (!done)
> > > + if (!done) {
> > > + /* See comments below. */
> > > + atomic_dec_return_release(&rcu_state.srs_cleanups_pending);
> >
> > This condition is not supposed to happen. If the work is scheduled,
> > there has to be a wait_queue in rcu_state.srs_done_tail. And decrementing
> > may make things worse.
> >
>
> I also don't see a scenario where this can happen. However, if we are
> returning from here, given that for every queued work we do an
> increment of rcu_state.srs_cleanups_pending, I think it's safer to
> decrement in this
> case, as that counter tracks only the work queuing and execution counts.
>
> atomic_inc(&rcu_state.srs_cleanups_pending);
> if (!queue_work(sync_wq, &rcu_state.srs_cleanup_work))
> atomic_dec(&rcu_state.srs_cleanups_pending);

Linus Torvald's general rule is that if you cannot imagine how a bug
can happen, don't attempt to clean up after it. His rationale (which
is *almost* always a good one) is that not knowing how the bug happens
means that attempts to clean up will usually just make matters worse.
And all too often, the clean-up code makes debugging more difficult.

One example exception to this rule is when debug-objects detects a
duplicate call_rcu(). In that case, we ignore that second call_rcu().
But the reason is that experience has shown that the usual cause really
is someone doing a duplicate call_rcu(), and also that ignoring the
second call_rcu() makes debugging easier.

So what is it that Frederic and I are missing here?

Thanx, Paul

> Thanks
> Neeraj
>
> > So this should be:
> >
> > if (WARN_ON_ONCE(!done))
> > return;
> >
> > Thanks.
> >
>