Re: Scenario TREE07 with CONFIG_PREEMPT_DYNAMIC=n?

From: Paul E. McKenney
Date: Fri Mar 11 2022 - 11:06:28 EST


On Fri, Mar 11, 2022 at 04:46:03PM +0100, Frederic Weisbecker wrote:
> On Fri, Mar 11, 2022 at 07:21:48AM -0800, Paul E. McKenney wrote:
> > On Fri, Mar 11, 2022 at 02:07:19PM +0100, Frederic Weisbecker wrote:
> > > On Thu, Mar 10, 2022 at 02:52:19PM -0800, Paul E. McKenney wrote:
> > > > On Thu, Mar 10, 2022 at 11:41:03PM +0100, Frederic Weisbecker wrote:
> > > > > On Thu, Mar 10, 2022 at 01:56:30PM -0800, Paul E. McKenney wrote:
> > > > > > Hello, Frederic,
> > > > > >
> > > > > > I recently added CONFIG_PREEMPT_DYNAMIC=n to the TREE07 file, and since
> > > > > > then am getting roughly one RCU CPU stall warning (or silent hang)
> > > > > > per few tens of hours of rcutorture testing on dual-socket systems.
> > > > > > The stall warnings feature starvation of RCU grace-period kthread.
> > > > > >
> > > > > > Any advice on debugging this?
> > > > >
> > > > > Oh, I'm testing that!
> > > >
> > > > Even better, thank you! ;-)
> > >
> > > Here is what I'm testing below. If it happens to work though, it's still not
> > > the most optimized way of dealing with the UP on SMP situation as we still start
> > > an exp grace period when we could early return. But since we have a cookie
> > > to pass to poll_state_synchronize_rcu_expedited()...
> > >
> > > Oh but if we were to early check a positive rcu_blocking_is_gp() from
> > > start_poll_synchronize_rcu_expedited(),
> > > we could simply return the current value of rcu_state.expedited_sequence without
> > > doing an rcu_exp_gp_seq_snap() and then pass that to
> > > poll_state_synchronize_rcu_expedited() which should then immediately return.
> > >
> > > Now even if we do that, we still need the below in case the CPUs went offline
> > > in the middle of start_poll_synchronize_rcu_expedited() (again, assuming this
> > > fixes the issue. I'm running the test).
> >
> > Color me slow and stupid!!!
> >
> > So the reason that this works for CONFIG_PREEMPT_DYNAMIC=y is that
> > the rcu_blocking_is_gp() was never updated to account for this.
> >
> > The first "if" in rcu_blocking_is_gp() needs to become something like
> > this:
> >
> > if (!preemption_boot_enabled())
> >
> > Where:
> >
> > bool preemption_boot_enabled(void)
> > {
> > #ifdef CONFIG_PREEMPT_DYNAMIC
> > return preempt_dynamic_mode == preempt_dynamic_full;
> > #else
> > return IS_ENABLED(CONFIG_PREEMPTION);
> > #endif
> > }
> >
> > Or am I missing something here?
>
> Oh right there is that too!

I am testing with that fastpath completely commented out, just to make
sure that we were seeing the same failure.

> I think we need to apply this patch:
> https://lore.kernel.org/lkml/20211110202448.4054153-3-valentin.schneider@xxxxxxx/
> and then your above change. I can cook a series with the below.

Agreed, Valentin's approach is better than my preemption_boot_enabled().
But when will CONFIG_PREEMPT_RT be boot-time selectable? (/me runs!)

Looking forward to your series!

Thanx, Paul

> > > ---
> > > >From 3c9f5df000b9659edbcf38cb87136fea1f8ac682 Mon Sep 17 00:00:00 2001
> > > From: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > Date: Fri, 11 Mar 2022 13:30:02 +0100
> > > Subject: [PATCH] rcu: Fix rcu exp polling
> > >
> > > Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> > > ---
> > > kernel/rcu/tree_exp.h | 52 ++++++++++++++++++++++++-------------------
> > > 1 file changed, 29 insertions(+), 23 deletions(-)
> > >
> > > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > > index d5f30085b0cf..69c4dcc9159a 100644
> > > --- a/kernel/rcu/tree_exp.h
> > > +++ b/kernel/rcu/tree_exp.h
> > > @@ -794,27 +794,7 @@ static int rcu_print_task_exp_stall(struct rcu_node *rnp)
> > >
> > > #endif /* #else #ifdef CONFIG_PREEMPT_RCU */
> > >
> > > -/**
> > > - * synchronize_rcu_expedited - Brute-force RCU grace period
> > > - *
> > > - * Wait for an RCU grace period, but expedite it. The basic idea is to
> > > - * IPI all non-idle non-nohz online CPUs. The IPI handler checks whether
> > > - * the CPU is in an RCU critical section, and if so, it sets a flag that
> > > - * causes the outermost rcu_read_unlock() to report the quiescent state
> > > - * for RCU-preempt or asks the scheduler for help for RCU-sched. On the
> > > - * other hand, if the CPU is not in an RCU read-side critical section,
> > > - * the IPI handler reports the quiescent state immediately.
> > > - *
> > > - * Although this is a great improvement over previous expedited
> > > - * implementations, it is still unfriendly to real-time workloads, so is
> > > - * thus not recommended for any sort of common-case code. In fact, if
> > > - * you are using synchronize_rcu_expedited() in a loop, please restructure
> > > - * your code to batch your updates, and then use a single synchronize_rcu()
> > > - * instead.
> > > - *
> > > - * This has the same semantics as (but is more brutal than) synchronize_rcu().
> > > - */
> > > -void synchronize_rcu_expedited(void)
> >
> > We should have a header comment here. Given that I missed the need for
> > this, for example. ;-)
> >
> > But feel free to send a formal patch without it, and I can add it.
> >
> > Otherwise, it looks good.
>
> Ok, preparing this.
>
> Thanks!