Re: [PATCH] rcu-tasks: Avoid rtp_irq_work triggering when the rcu-tasks GP is ongoing

From: Paul E. McKenney
Date: Sun Mar 17 2024 - 03:31:12 EST


On Sun, Mar 17, 2024 at 03:12:59PM +0800, Z qiang wrote:
> >
> > On Mon, Mar 11, 2024 at 11:55:02AM +0800, Zqiang wrote:
> > > This commit generate rcu_task_gp_in_progress() to check whether
> > > the rcu-tasks GP is ongoing, if is ongoing, avoid trigger
> > > rtp_irq_work to wakeup rcu tasks kthreads in call_rcu_tasks_generic().
> > >
> > > The test results are as follows:
> > >
> > > echo call_rcu_tasks_iw_wakeup > /sys/kernel/debug/tracing/set_ftrace_filter
> > > echo 1 > /sys/kernel/debug/tracing/function_profile_enabled
> > > insmod rcutorture.ko torture_type=tasks-tracing fwd_progress=4
> > > sleep 600
> > > rmmod rcutorture.ko
> > > echo 0 > /sys/kernel/debug/tracing/function_profile_enabled
> > > echo > /sys/kernel/debug/tracing/set_ftrace_filter
> > >
> > > head /sys/kernel/debug/tracing/trace_stat/function*
> > >
> > > original: 56376 apply patch: 33521
> > >
> > > Signed-off-by: Zqiang <qiang.zhang1211@xxxxxxxxx>
> >
> > Note that rcu_seq_current() does not provide ordering. So are you
> > sure that this change is safe on weakly ordered systems such as ARM?
>
> The rcu_seq_current() provide implicit address-dependency barriers, or
> did I miss something?

It would be ordered after the last update of the grace-period sequence
number, but that update has not happened yet.

> > For example, consider the following sequence of events:
> >
> > o The call_rcu_tasks_generic() function picks up the grace-period
> > sequence number, which shows that there is a grace period in
> > progress.
>
> The rcu-callback has been enqueued to list before we pick up the
> gp seq number.

That is true, but what guarantees that the grace-period kthread will see
the callback as having been enqueued? In a kernel that was built with
CONFIG_RCU_NOCB_CPU, rcu_segcblist_n_cbs() is just a READ_ONCE(), right?

> > o The grace period ends, and sees no reason to start a new grace
> > period.
>
> the gp ends, the rcu_tasks_need_gpcb() will be invoked to check
> whether to start a new gp. find pending callback, the new gp
> will start or did I miss something?

Maybe you aren't missing something, but in that case I need you to tell
me exactly what guarantees the ordering. Please keep in mind that even
in x86, propagation delays can result in reads on one CPU returning the
old value long after some other CPU wrote the new value. And at least
on the surface, this scenario involves each CPU doing an unordered read
and getting the old value. (The CPU queueing the new callback got an
old value for the grace-period sequence number and the CPU ending the
grace period got an old value for the number of callbacks queued at the
other CPU.)

So please tell me exactly what prevents that scenario from happening.

Thanx, Paul

> Thanks
> Zqiang
>
>
> >
> > o The call_rcu_tasks_generic() function sees no reason to wake
> > up the grace-period kthread. There are no more calls to
> > call_rcu_tasks*(), so the callback is never invoked.
> >
> > Or is there something that prevents this sequence of events from
> > ever happening on weakly ordered systems?
> >
> > Thanx, Paul
> >
> > > ---
> > >
> > > original:
> > > ==> /sys/kernel/debug/tracing/trace_stat/function0 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 13217 19292.52 us 1.459 us 8.834 us
> > >
> > > ==> /sys/kernel/debug/tracing/trace_stat/function1 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 15146 22377.01 us 1.477 us 22.873 us
> > >
> > > ==> /sys/kernel/debug/tracing/trace_stat/function2 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 12561 18125.76 us 1.443 us 6.372 us
> > >
> > > ==> /sys/kernel/debug/tracing/trace_stat/function3 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 15452 21770.57 us 1.408 us 6.710 us
> > >
> > > apply patch:
> > > ==> /sys/kernel/debug/tracing/trace_stat/function0 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 8334 15121.13 us 1.814 us 4.457 us
> > >
> > > ==> /sys/kernel/debug/tracing/trace_stat/function1 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 8355 15760.51 us 1.886 us 14.775 us
> > >
> > > ==> /sys/kernel/debug/tracing/trace_stat/function2 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 7219 14194.27 us 1.966 us 42.440 us
> > >
> > > ==> /sys/kernel/debug/tracing/trace_stat/function3 <==
> > > Function Hit Time Avg s^2
> > > -------- --- ---- --- ---
> > > call_rcu_tasks_iw_wakeup 9613 19850.04 us 2.064 us 91.023 us
> > >
> > > kernel/rcu/tasks.h | 8 +++++++-
> > > 1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > index 147b5945d67a..36c7e1d441d0 100644
> > > --- a/kernel/rcu/tasks.h
> > > +++ b/kernel/rcu/tasks.h
> > > @@ -317,6 +317,11 @@ static void call_rcu_tasks_iw_wakeup(struct irq_work *iwp)
> > > rcuwait_wake_up(&rtp->cbs_wait);
> > > }
> > >
> > > +static int rcu_task_gp_in_progress(struct rcu_tasks *rtp)
> > > +{
> > > + return rcu_seq_state(rcu_seq_current(&rtp->tasks_gp_seq));
> > > +}
> > > +
> > > // Enqueue a callback for the specified flavor of Tasks RCU.
> > > static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
> > > struct rcu_tasks *rtp)
> > > @@ -375,7 +380,8 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func,
> > > }
> > > rcu_read_unlock();
> > > /* We can't create the thread unless interrupts are enabled. */
> > > - if (needwake && READ_ONCE(rtp->kthread_ptr))
> > > + if (needwake && READ_ONCE(rtp->kthread_ptr) &&
> > > + !rcu_task_gp_in_progress(rtp))
> > > irq_work_queue(&rtpcp->rtp_irq_work);
> > > }
> > >
> > > --
> > > 2.17.1
> > >