Re: [PATCH rcu 1/6] rcu: Remove full ordering on second EQS snapshot

From: Frederic Weisbecker
Date: Wed Jun 05 2024 - 09:05:46 EST


Le Tue, Jun 04, 2024 at 03:26:47PM -0700, Paul E. McKenney a écrit :
> From: Frederic Weisbecker <frederic@xxxxxxxxxx>
>
> When the grace period kthread checks the extended quiescent state
> counter of a CPU, full ordering is necessary to ensure that either:
>
> * If the GP kthread observes the remote target in an extended quiescent
> state, then that target must observe all accesses prior to the current
> grace period, including the current grace period sequence number, once
> it exits that extended quiescent state. Also the GP kthread must
> observe all accesses performed by the target prior it entering in
> EQS.
>
> or:
>
> * If the GP kthread observes the remote target NOT in an extended
> quiescent state, then the target further entering in an extended
> quiescent state must observe all accesses prior to the current
> grace period, including the current grace period sequence number, once
> it enters that extended quiescent state. Also the GP kthread later
> observing that EQS must also observe all accesses performed by the
> target prior it entering in EQS.
>
> This ordering is explicitly performed both on the first EQS snapshot
> and on the second one as well through the combination of a preceding
> full barrier followed by an acquire read. However the second snapshot's
> full memory barrier is redundant and not needed to enforce the above
> guarantees:
>
> GP kthread Remote target
> ---- -----
> // Access prior GP
> WRITE_ONCE(A, 1)
> // first snapshot
> smp_mb()
> x = smp_load_acquire(EQS)
> // Access prior GP
> WRITE_ONCE(B, 1)
> // EQS enter
> // implied full barrier by atomic_add_return()
> atomic_add_return(RCU_DYNTICKS_IDX, EQS)
> // implied full barrier by atomic_add_return()
> READ_ONCE(A)
> // second snapshot
> y = smp_load_acquire(EQS)
> z = READ_ONCE(B)
>
> If the GP kthread above fails to observe the remote target in EQS
> (x not in EQS), the remote target will observe A == 1 after further
> entering in EQS. Then the second snapshot taken by the GP kthread only
> need to be an acquire read in order to observe z == 1.
>
> Therefore remove the needless full memory barrier on second snapshot.
>
> Signed-off-by: Frederic Weisbecker <frederic@xxxxxxxxxx>
> Signed-off-by: Paul E. McKenney <paulmck@xxxxxxxxxx>
> ---
> kernel/rcu/tree.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index 28c7031711a3f..f07b8bff4621b 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -321,7 +321,7 @@ static bool rcu_dynticks_in_eqs(int snap)
> */
> static bool rcu_dynticks_in_eqs_since(struct rcu_data *rdp, int snap)
> {
> - return snap != rcu_dynticks_snap(rdp->cpu);
> + return snap != ct_dynticks_cpu_acquire(rdp->cpu);

I guess I'm going to add a comment here to elaborate on the fact
it relies on the ordering enforced before the first snapshot. Would
you prefer a delta patch or an updated patch?

Thanks.

> }
>
> /*
> --
> 2.40.1
>
>