Re: [PATCH rcu 2/6] rcu: Remove superfluous full memory barrier upon first EQS snapshot

From: Neeraj upadhyay
Date: Wed Jun 12 2024 - 04:27:50 EST


On Wed, Jun 5, 2024 at 3:58 AM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> From: Frederic Weisbecker <frederic@xxxxxxxxxx>
>
> When the grace period kthread checks the extended quiescent state
> counter of a CPU, full ordering is necessary to ensure that either:
>
> * If the GP kthread observes the remote target in an extended quiescent
> state, then that target must observe all accesses prior to the current
> grace period, including the current grace period sequence number, once
> it exits that extended quiescent state.
>
> or:
>
> * If the GP kthread observes the remote target NOT in an extended
> quiescent state, then the target further entering in an extended
> quiescent state must observe all accesses prior to the current
> grace period, including the current grace period sequence number, once
> it enters that extended quiescent state.
>
> This ordering is enforced through a full memory barrier placed right
> before taking the first EQS snapshot. However this is superfluous
> because the snapshot is taken while holding the target's rnp lock which
> provides the necessary ordering through its chain of
> smp_mb__after_unlock_lock().
>
> Remove the needless explicit barrier before the snapshot and put a
> comment about the implicit barrier newly relied upon here.
>
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> ---
> .../Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst | 6 +++---
> kernel/rcu/tree.c | 7 ++++++-
> 2 files changed, 9 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> index 5750f125361b0..728b1e690c646 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> @@ -149,9 +149,9 @@ This case is handled by calls to the strongly ordered
> ``atomic_add_return()`` read-modify-write atomic operation that
> is invoked within ``rcu_dynticks_eqs_enter()`` at idle-entry
> time and within ``rcu_dynticks_eqs_exit()`` at idle-exit time.
> -The grace-period kthread invokes ``rcu_dynticks_snap()`` and
> -``rcu_dynticks_in_eqs_since()`` (both of which invoke
> -an ``atomic_add_return()`` of zero) to detect idle CPUs.
> +The grace-period kthread invokes first ``ct_dynticks_cpu_acquire()``
> +(preceded by a full memory barrier) and ``rcu_dynticks_in_eqs_since()``
> +(both of which rely on acquire semantics) to detect idle CPUs.
>
> +-----------------------------------------------------------------------+
> | **Quick Quiz**: |
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index f07b8bff4621b..1a6ef9c5c949e 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -769,7 +769,12 @@ static void rcu_gpnum_ovf(struct rcu_node *rnp, struct rcu_data *rdp)
> */
> static int dyntick_save_progress_counter(struct rcu_data *rdp)
> {
> - rdp->dynticks_snap = rcu_dynticks_snap(rdp->cpu);
> + /*
> + * Full ordering against accesses prior current GP and also against
> + * current GP sequence number is enforced by current rnp locking
> + * with chained smp_mb__after_unlock_lock().
> + */

It might be worth mentioning that this chained smp_mb__after_unlock_lock()
is provided by rnp leaf node locking in rcu_gp_init() and rcu_gp_fqs_loop() ?


Thanks
Neeraj

> + rdp->dynticks_snap = ct_dynticks_cpu_acquire(rdp->cpu);
> if (rcu_dynticks_in_eqs(rdp->dynticks_snap)) {
> trace_rcu_fqs(rcu_state.name, rdp->gp_seq, rdp->cpu, TPS("dti"));
> rcu_gpnum_ovf(rdp->mynode, rdp);
> --
> 2.40.1
>
>