Re: rcu_read_lock lost its compiler barrier
From: Herbert Xu
Date: Mon Jun 03 2019 - 01:30:31 EST
On Sun, Jun 02, 2019 at 08:47:07PM -0700, Paul E. McKenney wrote:
> 1. These guarantees are of full memory barriers, -not- compiler
What I'm saying is that wherever they are, they must come with
compiler barriers. I'm not aware of any synchronisation mechanism
in the kernel that gives a memory barrier without a compiler barrier.
> 2. These rules don't say exactly where these full memory barriers
> go. SRCU is at one extreme, placing those full barriers in
> srcu_read_lock() and srcu_read_unlock(), and !PREEMPT Tree RCU
> at the other, placing these barriers entirely within the callback
> queueing/invocation, grace-period computation, and the scheduler.
> Preemptible Tree RCU is in the middle, with rcu_read_unlock()
> sometimes including a full memory barrier, but other times with
> the full memory barrier being confined as it is with !PREEMPT
> Tree RCU.
The rules do say that the (full) memory barrier must precede any
RCU read-side that occur after the synchronize_rcu and after the
end of any RCU read-side that occur before the synchronize_rcu.
All I'm arguing is that wherever that full mb is, as long as it
also carries with it a barrier() (which it must do if it's done
using an existing kernel mb/locking primitive), then we're fine.
> Interleaving and inserting full memory barriers as per the rules above:
> CPU1: WRITE_ONCE(a, 1)
> CPU1: synchronize_rcu
> /* Could put a full memory barrier here, but it wouldn't help. */
Let's put them in because I think they are critical. smp_mb() also
carries with it a barrier().
> CPU2: rcu_read_lock();
> CPU1: b = 2;
> CPU2: if (READ_ONCE(a) == 0)
> CPU2: if (b != 1) /* Weakly ordered CPU moved this up! */
> CPU2: b = 1;
> CPU2: rcu_read_unlock
> In fact, CPU2's load from b might be moved up to race with CPU1's store,
> which (I believe) is why the model complains in this case.
Let's put aside my doubt over how we're even allowing a compiler
b = 1
if (b != 1)
b = 1
Since you seem to be assuming that (a == 0) is true in this case
(as the assignment b = 1 is carried out), then because of the
presence of the full memory barrier, the RCU read-side section
must have started prior to the synchronize_rcu. This means that
synchronize_rcu is not allowed to return until at least the end
of the grace period, or at least until the end of rcu_read_unlock.
So it actually should be:
CPU1: WRITE_ONCE(a, 1)
CPU1: synchronize_rcu called
/* Could put a full memory barrier here, but it wouldn't help. */
CPU2: grace period starts
CPU2: if (READ_ONCE(a) == 0)
CPU2: if (b != 1) /* Weakly ordered CPU moved this up! */
CPU2: b = 1;
CPU2: grace period ends
/* This full memory barrier is also guaranteed by RCU. */
CPU1 synchronize_rcu returns
CPU1: b = 2;
Email: Herbert Xu <herbert@xxxxxxxxxxxxxxxxxxx>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt