Re: [PATCH] rcu: Add exp QS check in rcu_exp_handler() for no-preemptible expedited RCU
From: Paul E. McKenney
Date: Wed Jun 22 2022 - 20:34:17 EST
On Wed, Jun 22, 2022 at 11:34:15PM +0000, Zhang, Qiang1 wrote:
>
> On Wed, Jun 22, 2022 at 06:35:49PM +0800, Zqiang wrote:
> > In CONFIG_PREEMPT=n and CONFIG_PREEMPT_COUNT=y kernel, after a exp
> > grace period begins, if detected current CPU enters idle in
> > rcu_exp_handler() IPI handler, will immediately report the exp QS of the
> > current cpu, at this time, maybe not being in an RCU read-side critical
> > section, but need wait until rcu-softirq or sched-clock irq or sched-switch
> > occurs on current CPU to check and report exp QS.
> >
>
> >I think the idea is OK, however, this "optimization" is based on the
> >implementation detail that rcu_read_lock() counts preempt_count when
> >CONFIG_PREEMPT_COUNT=y, right? It's a little bit dangerous because the
> >preempt_count when CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n is mostly
> >for debugging purposes IIUC, and in other words, _it could be gone_.
> >
>
> Yes, for CONFIG_PREEMPT_COUNT=y and CONFIG_PREEMPT=n kernel
> The rcu_read_lock/unlock are replaced by preempt_disbale/enable, and the
> preempt-count is exists, so can report exp QS when not being an RCU
> read-side critical(preempt_count & (PREEMPT_MASK | SOFTIRQ_MASK )return zero).
> in IPI handler.
>
> For CONFIG_PREEMPT_COUNT=n and CONFIG_PREEMPT=n kernel,
> The rcu_read_lock/unlock is just barrier().
>
>
> So I add IS_ENABLED(CONFIG_PREEMPT_COUNT) check in code.
>
> Of course, for CONFIG_PREEMPT_COUNT=n kernel, in RCU softirq, the
> preempt-count is also checked
>
> /* Report any deferred quiescent states if preemption enabled. */
> if (IS_ENABLED(CONFIG_PREEMPT_COUNT) && (!(preempt_count() & PREEMPT_MASK))) {
> rcu_preempt_deferred_qs(current);
>
> but the RCU softirq may not be triggered in time and reported exp QS, for
> example a kernel loop exist on NO_HZ_FULL CPU
>
> this change, It is to capture the exp QS state earlier and report it.
>
>
> >Also I'm not aware of any but there could be someone assuming that RCU
> >read-side critical sections can be formed without
> >rcu_read_{lock,unlock}() in CONFIG_PREEMPT=n kernel. For example, there
> >might be "creative" code like the following:
> >
> > void do_something_only_in_nonpreempt(void)
> > {
> > int *p;
> >
> > // This function only gets called in PREEMPT=n kernel,
> > // which means everywhere is a RCU read-side critical
> > // section, let's save some lines of code.
> >
> rcu_read_lock();
> > p = rcu_dereference_check(gp, !IS_ENABLED(PREEMPT));
> > ... // of course no schedule() here.
> > <access p>
> rcu_read_unlock();
> > }
> >
>
> Usually access to pointers of type rcu needs to be protected.
Indeed, lockdep would normally complain about this sort of thing.
But in kernels built with (say) CONFIG_PREEMPT_NONE=y but without
CONFIG_PREEMPT_COUNT=N, can lockdep really tell the difference?
> Any thoughts?
It would be good to have some performance data on this change to expedited
grace periods. It is adding code, so it needs some real motivation.
So, does this change make a system-level difference in (say) expedited
RCU grace-period latency, and if so, under what conditions?
Thanx, Paul
> >Again, I'm not aware of any existing code that does this but we need to
> >be sure.
> >
> >Regards,
> >Boqun
> >
> > This commit add a exp QS check in rcu_exp_handler(), when not being
> > in an RCU read-side critical section, report exp QS earlier.
> >
> > Signed-off-by: Zqiang <qiang1.zhang@xxxxxxxxx>
> > ---
> > kernel/rcu/tree_exp.h | 5 ++++-
> > 1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
> > index be667583a554..34f08267410f 100644
> > --- a/kernel/rcu/tree_exp.h
> > +++ b/kernel/rcu/tree_exp.h
> > @@ -828,11 +828,14 @@ static void rcu_exp_handler(void *unused)
> > {
> > struct rcu_data *rdp = this_cpu_ptr(&rcu_data);
> > struct rcu_node *rnp = rdp->mynode;
> > + bool preempt_bh_disabled =
> > + !!(preempt_count() & (PREEMPT_MASK | SOFTIRQ_MASK));
> >
> > if (!(READ_ONCE(rnp->expmask) & rdp->grpmask) ||
> > __this_cpu_read(rcu_data.cpu_no_qs.b.exp))
> > return;
> > - if (rcu_is_cpu_rrupt_from_idle()) {
> > + if (rcu_is_cpu_rrupt_from_idle() ||
> > + (IS_ENABLED(CONFIG_PREEMPT_COUNT) && !preempt_bh_disabled)) {
> > rcu_report_exp_rdp(this_cpu_ptr(&rcu_data));
> > return;
> > }
> > --
> > 2.25.1
> >