I don't think the address dependency is enough on its own. By that
reasoning, the following variant (WRC+addr+addr) would work too:
P0:
Wx = 1
P1:
Rx == 1
<address dep>
Wy = 1
P2:
Ry == 1
<address dep>
Rx = 0
So are you saying that this is also forbidden?
Imagine that P0 and P1 are two threads that share a store buffer. What
then?
To deal with this, a data dependency barrier or better must be inserted...
between the address load and the data load:
CPU 1 CPU 2
=============== ===============
{ A == 1, B == 2, C = 3, P == &A, Q == &C }
B = 4;
<write barrier>
WRITE_ONCE(P, &B);
Q = READ_ONCE(P);
<data dependency barrier> <----------- SYNC_RMB is here
D = *Q;
Another example of where data dependency barriers might be required is where a
number is read from memory and then used to calculate the index for an array
access:
CPU 1 CPU 2
=============== ===============
{ M[0] == 1, M[1] == 2, M[3] = 3, P == 0, Q == 3 }
M[1] = 4;
<write barrier>
WRITE_ONCE(P, 1);
Q = READ_ONCE(P);
<data dependency barrier> <------------ SYNC_RMB is here
D = M[Q];