Re: scheduler problems in -next (was: Re: [PATCH 6.4 000/227] 6.4.7-rc1 review)
From: Roy Hopkins
Date: Wed Aug 02 2023 - 09:58:05 EST
On Tue, 2023-08-01 at 12:11 -0700, Paul E. McKenney wrote:
> On Tue, Aug 01, 2023 at 10:32:45AM -0700, Guenter Roeck wrote:
>
>
> Please see below for my preferred fix. Does this work for you guys?
>
> Back to figuring out why recent kernels occasionally to blow up all
> rcutorture guest OSes...
>
> Thanx, Paul
>
> ------------------------------------------------------------------------
>
> diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> index 7294be62727b..2d5b8385c357 100644
> --- a/kernel/rcu/tasks.h
> +++ b/kernel/rcu/tasks.h
> @@ -570,10 +570,12 @@ static void rcu_tasks_one_gp(struct rcu_tasks *rtp, bool midboot)
> if (unlikely(midboot)) {
> needgpcb = 0x2;
> } else {
> + mutex_unlock(&rtp->tasks_gp_mutex);
> set_tasks_gp_state(rtp, RTGS_WAIT_CBS);
> rcuwait_wait_event(&rtp->cbs_wait,
> (needgpcb = rcu_tasks_need_gpcb(rtp)),
> TASK_IDLE);
> + mutex_lock(&rtp->tasks_gp_mutex);
> }
>
> if (needgpcb & 0x2) {
Your preferred fix looks good to me.
With the original code I can quite easily reproduce the problem on my
system every 10 reboots or so. With your fix in place the problem no
longer occurs.