Re: [PATCH 2/2] tools/memory-model: Add write ordering by release-acquire and by locks

From: Paul E. McKenney
Date: Wed Jul 04 2018 - 07:26:51 EST


On Tue, Jul 03, 2018 at 01:28:17PM -0400, Alan Stern wrote:
> Will:
>
> On Mon, 25 Jun 2018, Andrea Parri wrote:
>
> > On Fri, Jun 22, 2018 at 07:30:08PM +0100, Will Deacon wrote:
> > > > > I think the second example would preclude us using LDAPR for load-acquire,
> >
> > > I don't think it's a moot point. We want new architectures to implement
> > > acquire/release efficiently, and it's not unlikely that they will have
> > > acquire loads that are similar in semantics to LDAPR. This patch prevents
> > > them from doing so,
> >
> > By this same argument, you should not be a "big fan" of rfi-rel-acq in ppo ;)
> > consider, e.g., the two litmus tests below: what am I missing?
>
> This is an excellent point, which seems to have gotten lost in the
> shuffle. I'd like to see your comments.
>
> In essence, if you're using release-acquire instructions that only
> provide RCpc consistency, does store-release followed by load-acquire
> of the same address provide read-read ordering? In theory it doesn't
> have to, because if the value from the store-release is forwarded to
> the load-acquire then:
>
> LOAD A
> STORE-RELEASE X, v
> LOAD-ACQUIRE X
> LOAD B
>
> could be executed by the CPU in the order:
>
> LOAD-ACQUIRE X
> LOAD B
> LOAD A
> STORE-RELEASE X, v
>
> thereby accessing A and B out of program order without violating the
> requirements on the release or the acquire.
>
> Of course PPC doesn't allow this, but should we rule it out entirely?
>
> > C MP+fencewmbonceonce+pooncerelease-rfireleaseacquire-poacquireonce
> >
> > {}
> >
> > P0(int *x, int *y)
> > {
> > WRITE_ONCE(*x, 1);
> > smp_wmb();
> > WRITE_ONCE(*y, 1);
> > }
> >
> > P1(int *x, int *y, int *z)
> > {
> > r0 = READ_ONCE(*y);
> > smp_store_release(z, 1);
> > r1 = smp_load_acquire(z);
> > r2 = READ_ONCE(*x);
> > }
> >
> > exists (1:r0=1 /\ 1:r1=1 /\ 1:r2=0)
> >
> >
> > AArch64 MP+dmb.st+popl-rfilq-poqp
> > "DMB.STdWW Rfe PodRWPL RfiLQ PodRRQP Fre"
> > Generator=diyone7 (version 7.49+02(dev))
> > Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
> > Com=Rf Fr
> > Orig=DMB.STdWW Rfe PodRWPL RfiLQ PodRRQP Fre
> > {
> > 0:X1=x; 0:X3=y;
> > 1:X1=y; 1:X3=z; 1:X6=x;
> > }
> > P0 | P1 ;
> > MOV W0,#1 | LDR W0,[X1] ;
> > STR W0,[X1] | MOV W2,#1 ;
> > DMB ST | STLR W2,[X3] ;
> > MOV W2,#1 | LDAPR W4,[X3] ;
> > STR W2,[X3] | LDR W5,[X6] ;
> > exists
> > (1:X0=1 /\ 1:X4=1 /\ 1:X5=0)
>
> There's also read-write ordering, in the form of the LB pattern:
>
> P0(int *x, int *y, int *z)
> {
> r0 = READ_ONCE(*x);
> smp_store_release(z, 1);
> r1 = smp_load_acquire(z);
> WRITE_ONCE(*y, 1);
> }
>
> P1(int *x, int *y)
> {
> r2 = READ_ONCE(*y);
> smp_mp();
> WRITE_ONCE(*x, 1);
> }
>
> exists (0:r0=1 /\ 1:r2=1)
>
> Would this be allowed if smp_load_acquire() was implemented with LDAPR?
> If the answer is yes then we will have to remove the rfi-rel-acq and
> rel-rf-acq-po relations from the memory model entirely.
>
> Alan
>
> PS: Paul, is the patch which introduced rel-rf-acq-po currently present
> in any of your branches? I couldn't find it.

It is not, I will add it back in. I misinterpreted your "drop this
patch" on 2/2 as "drop both patches". Please accept my apologies!

Just to double-check, the patch below should be added, correct?

Thanx, Paul

------------------------------------------------------------------------

Date: Thu, 21 Jun 2018 13:26:49 -0400 (EDT)
From: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
To: LKMM Maintainers -- Akira Yokosawa <akiyks@xxxxxxxxx>, Andrea Parri
<andrea.parri@xxxxxxxxxxxxxxxxxxxx>, Boqun Feng
<boqun.feng@xxxxxxxxx>, David Howells <dhowells@xxxxxxxxxx>,
Jade Alglave <j.alglave@xxxxxxxxx>, Luc Maranget <luc.maranget@xxxxxxxx>,
Nicholas Piggin <npiggin@xxxxxxxxx>,
"Paul E. McKenney" <paulmck@xxxxxxxxxxxxxxxxxx>, Peter Zijlstra <peterz@xxxxxxxxxxxxx>,
Will Deacon <will.deacon@xxxxxxx>
cc: Kernel development list <linux-kernel@xxxxxxxxxxxxxxx>
Subject: [PATCH 1/2] tools/memory-model: Change rel-rfi-acq ordering to
(rel-rf-acq-po & int)
Message-ID: <Pine.LNX.4.44L0.1806211315550.2381-100000@xxxxxxxxxxxxxxxxxxxx>

This patch changes the LKMM rule which says that an acquire which
reads from an earlier release must be executed after that release (in
other words, the release cannot be forwarded to the acquire). This is
not true on PowerPC, for example.

What is true instead is that any instruction following the acquire
must be executed after the release. On some architectures this is
because a write-release cannot be forwarded to a read-acquire; on
others (including PowerPC) it is because the implementation of
smp_load_acquire() places a memory barrier immediately after the
load.

This change to the model does not cause any change to the model's
predictions. This is because any link starting from a load must be an
instance of either po or fr. In the po case, the new rule will still
provide ordering. In the fr case, we also have ordering because there
must be a co link to the same destination starting from the
write-release.

Signed-off-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>

---


[as1870]


tools/memory-model/Documentation/explanation.txt | 35 ++++++++++++-----------
tools/memory-model/linux-kernel.cat | 6 +--
2 files changed, 22 insertions(+), 19 deletions(-)

Index: usb-4.x/tools/memory-model/linux-kernel.cat
===================================================================
--- usb-4.x.orig/tools/memory-model/linux-kernel.cat
+++ usb-4.x/tools/memory-model/linux-kernel.cat
@@ -38,7 +38,7 @@ let strong-fence = mb | gp
(* Release Acquire *)
let acq-po = [Acquire] ; po ; [M]
let po-rel = [M] ; po ; [Release]
-let rfi-rel-acq = [Release] ; rfi ; [Acquire]
+let rel-rf-acq-po = [Release] ; rf ; [Acquire] ; po

(**********************************)
(* Fundamental coherence ordering *)
@@ -60,9 +60,9 @@ let dep = addr | data
let rwdep = (dep | ctrl) ; [W]
let overwrite = co | fr
let to-w = rwdep | (overwrite & int)
-let to-r = addr | (dep ; rfi) | rfi-rel-acq
+let to-r = addr | (dep ; rfi)
let fence = strong-fence | wmb | po-rel | rmb | acq-po
-let ppo = to-r | to-w | fence
+let ppo = to-r | to-w | fence | (rel-rf-acq-po & int)

(* Propagation: Ordering from release operations and strong fences. *)
let A-cumul(r) = rfe? ; r
Index: usb-4.x/tools/memory-model/Documentation/explanation.txt
===================================================================
--- usb-4.x.orig/tools/memory-model/Documentation/explanation.txt
+++ usb-4.x/tools/memory-model/Documentation/explanation.txt
@@ -1067,27 +1067,30 @@ allowing out-of-order writes like this t
violating the write-write coherence rule by requiring the CPU not to
send the W write to the memory subsystem at all!)

-There is one last example of preserved program order in the LKMM: when
-a load-acquire reads from an earlier store-release. For example:
+There is one last example of preserved program order in the LKMM; it
+applies to instructions po-after a load-acquire which reads from an
+earlier store-release. For example:

smp_store_release(&x, 123);
r1 = smp_load_acquire(&x);
+ WRITE_ONCE(&y, 246);

If the smp_load_acquire() ends up obtaining the 123 value that was
-stored by the smp_store_release(), the LKMM says that the load must be
-executed after the store; the store cannot be forwarded to the load.
-This requirement does not arise from the operational model, but it
-yields correct predictions on all architectures supported by the Linux
-kernel, although for differing reasons.
-
-On some architectures, including x86 and ARMv8, it is true that the
-store cannot be forwarded to the load. On others, including PowerPC
-and ARMv7, smp_store_release() generates object code that starts with
-a fence and smp_load_acquire() generates object code that ends with a
-fence. The upshot is that even though the store may be forwarded to
-the load, it is still true that any instruction preceding the store
-will be executed before the load or any following instructions, and
-the store will be executed before any instruction following the load.
+written by the smp_store_release(), the LKMM says that the store to y
+must be executed after the store to x. In fact, the only way this
+could fail would be if the store-release was forwarded to the
+load-acquire; the LKMM says it holds even in that case. This
+requirement does not arise from the operational model, but it yields
+correct predictions on all architectures supported by the Linux
+kernel, although for differing reasons:
+
+On some architectures, including x86 and ARMv8, a store-release cannot
+be forwarded to a load-acquire. On others, including PowerPC and
+ARMv7, smp_load_acquire() generates object code that ends with a
+fence. The result is that even though the store-release may be
+forwarded to the load-acquire, it is still true that the store-release
+(and all preceding instructions) will be executed before any
+instruction following the load-acquire.


AND THEN THERE WAS ALPHA