Re: scheduler problems in -next (was: Re: [PATCH 6.4 000/227] 6.4.7-rc1 review)

From: Paul E. McKenney
Date: Wed Aug 02 2023 - 12:52:31 EST


On Wed, Aug 02, 2023 at 04:31:12PM +0100, Roy Hopkins wrote:
> On Wed, 2023-08-02 at 08:05 -0700, Paul E. McKenney wrote:
> > On Wed, Aug 02, 2023 at 02:57:56PM +0100, Roy Hopkins wrote:
> > > On Tue, 2023-08-01 at 12:11 -0700, Paul E. McKenney wrote:
> > > > On Tue, Aug 01, 2023 at 10:32:45AM -0700, Guenter Roeck wrote:
> > > >
> > > >
> > > > Please see below for my preferred fix.  Does this work for you guys?
> > > >
> > > > Back to figuring out why recent kernels occasionally to blow up all
> > > > rcutorture guest OSes...
> > > >
> > > >                                                         Thanx, Paul
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > > index 7294be62727b..2d5b8385c357 100644
> > > > --- a/kernel/rcu/tasks.h
> > > > +++ b/kernel/rcu/tasks.h
> > > > @@ -570,10 +570,12 @@ static void rcu_tasks_one_gp(struct rcu_tasks *rtp,
> > > > bool midboot)
> > > >         if (unlikely(midboot)) {
> > > >                 needgpcb = 0x2;
> > > >         } else {
> > > > +               mutex_unlock(&rtp->tasks_gp_mutex);
> > > >                 set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
> > > >                 rcuwait_wait_event(&rtp->cbs_wait,
> > > >                                    (needgpcb = rcu_tasks_need_gpcb(rtp)),
> > > >                                    TASK_IDLE);
> > > > +               mutex_lock(&rtp->tasks_gp_mutex);
> > > >         }
> > > >  
> > > >         if (needgpcb & 0x2) {
> > >
> > > Your preferred fix looks good to me.
> > >
> > > With the original code I can quite easily reproduce the problem on my 
> > > system every 10 reboots or so. With your fix in place the problem no
> > > longer occurs.
> >
> > Very good, thank you!  May I add your Tested-by?
> >
> >                                                         Thanx, Paul
> Yes, please do.

Thank you again, and I will apply this on my next rebase.

Thanx, Paul