Re: [PATCH 2/2] tools/memory-model: Add write ordering by release-acquire and by locks

From: Alan Stern
Date: Thu Jul 05 2018 - 10:21:43 EST


On Wed, 4 Jul 2018, Will Deacon wrote:

> Hi Alan,
>
> On Tue, Jul 03, 2018 at 01:28:17PM -0400, Alan Stern wrote:
> > On Mon, 25 Jun 2018, Andrea Parri wrote:
> >
> > > On Fri, Jun 22, 2018 at 07:30:08PM +0100, Will Deacon wrote:
> > > > > > I think the second example would preclude us using LDAPR for load-acquire,
> > >
> > > > I don't think it's a moot point. We want new architectures to implement
> > > > acquire/release efficiently, and it's not unlikely that they will have
> > > > acquire loads that are similar in semantics to LDAPR. This patch prevents
> > > > them from doing so,
> > >
> > > By this same argument, you should not be a "big fan" of rfi-rel-acq in ppo ;)
> > > consider, e.g., the two litmus tests below: what am I missing?
> >
> > This is an excellent point, which seems to have gotten lost in the
> > shuffle. I'd like to see your comments.
>
> Yeah, sorry. Loads going on at the moment. You could ask herd instead of me
> though ;)

Indeed; and the answer was as expected. Sometimes one gains additional
insights by asking a person, though.

> > In essence, if you're using release-acquire instructions that only
> > provide RCpc consistency, does store-release followed by load-acquire
> > of the same address provide read-read ordering? In theory it doesn't
> > have to, because if the value from the store-release is forwarded to
> > the load-acquire then:
> >
> > LOAD A
> > STORE-RELEASE X, v
> > LOAD-ACQUIRE X
> > LOAD B
> >
> > could be executed by the CPU in the order:
> >
> > LOAD-ACQUIRE X
> > LOAD B
> > LOAD A
> > STORE-RELEASE X, v
> >
> > thereby accessing A and B out of program order without violating the
> > requirements on the release or the acquire.
> >
> > Of course PPC doesn't allow this, but should we rule it out entirely?
>
> This would be allowed if LOAD-ACQUIRE was implemented using LDAPR on Arm.
> I don't think we should be ruling out architectures using RCpc
> acquire/release primitives, because doing so just feels like an artifact of
> most architectures building these out of fences today.
>
> It's funny really, because from an Arm-perspective I don't plan to stray
> outside of RCsc, but I feel like other weak architectures aren't being
> well represented here. If we just care about x86, Arm and Power (and assume
> that Power doesn't plan to implement RCpc acquire/release instructions)
> then we're good to tighten things up. But I fear that RISC-V should probably
> be more engaged (adding Daniel) and who knows about MIPS or these other
> random architectures popping up on linux-arch.

I don't object to having weak versions of acquire/release in the LKMM.
Perhaps the stronger versions could be kept in the hardware model
(which has not been published and is not in the kernel source), but
even that might be a bad idea in view of what RISC-V is liable to do.

> > > C MP+fencewmbonceonce+pooncerelease-rfireleaseacquire-poacquireonce
> > >
> > > {}
> > >
> > > P0(int *x, int *y)
> > > {
> > > WRITE_ONCE(*x, 1);
> > > smp_wmb();
> > > WRITE_ONCE(*y, 1);
> > > }
> > >
> > > P1(int *x, int *y, int *z)
> > > {
> > > r0 = READ_ONCE(*y);
> > > smp_store_release(z, 1);
> > > r1 = smp_load_acquire(z);
> > > r2 = READ_ONCE(*x);
> > > }
> > >
> > > exists (1:r0=1 /\ 1:r1=1 /\ 1:r2=0)
> > >
> > >
> > > AArch64 MP+dmb.st+popl-rfilq-poqp
> > > "DMB.STdWW Rfe PodRWPL RfiLQ PodRRQP Fre"
> > > Generator=diyone7 (version 7.49+02(dev))
> > > Prefetch=0:x=F,0:y=W,1:y=F,1:x=T
> > > Com=Rf Fr
> > > Orig=DMB.STdWW Rfe PodRWPL RfiLQ PodRRQP Fre
> > > {
> > > 0:X1=x; 0:X3=y;
> > > 1:X1=y; 1:X3=z; 1:X6=x;
> > > }
> > > P0 | P1 ;
> > > MOV W0,#1 | LDR W0,[X1] ;
> > > STR W0,[X1] | MOV W2,#1 ;
> > > DMB ST | STLR W2,[X3] ;
> > > MOV W2,#1 | LDAPR W4,[X3] ;
> > > STR W2,[X3] | LDR W5,[X6] ;
> > > exists
> > > (1:X0=1 /\ 1:X4=1 /\ 1:X5=0)
>
> (you can also run this yourself, since 'Q' is supported in the .cat file
> I contributed to herdtools7)
>
> Test MP+dmb.sy+popl-rfilq-poqp Allowed
> States 4
> 1:X0=0; 1:X4=1; 1:X5=0;
> 1:X0=0; 1:X4=1; 1:X5=1;
> 1:X0=1; 1:X4=1; 1:X5=0;
> 1:X0=1; 1:X4=1; 1:X5=1;
> Ok
> Witnesses
> Positive: 1 Negative: 3
> Condition exists (1:X0=1 /\ 1:X4=1 /\ 1:X5=0)
> Observation MP+dmb.sy+popl-rfilq-poqp Sometimes 1 3
> Time MP+dmb.sy+popl-rfilq-poqp 0.01
> Hash=61858b7b59a6310d869f99cd05718f96
>
> > There's also read-write ordering, in the form of the LB pattern:
> >
> > P0(int *x, int *y, int *z)
> > {
> > r0 = READ_ONCE(*x);
> > smp_store_release(z, 1);
> > r1 = smp_load_acquire(z);
> > WRITE_ONCE(*y, 1);
> > }
> >
> > P1(int *x, int *y)
> > {
> > r2 = READ_ONCE(*y);
> > smp_mp();
> > WRITE_ONCE(*x, 1);
> > }
> >
> > exists (0:r0=1 /\ 1:r2=1)
>
> The access types are irrelevant to the acquire/release primitives, so yes
> that's also allowed.
>
> > Would this be allowed if smp_load_acquire() was implemented with LDAPR?
> > If the answer is yes then we will have to remove the rfi-rel-acq and
> > rel-rf-acq-po relations from the memory model entirely.
>
> I don't understand what you mean by "rfi-rel-acq-po", and I assume you mean
> rel-rfi-acq-po for the other? Sounds like I'm confused here.

"rfi-rel-acq" is the relation which was removed by the first of my two
patches (it is now back in business since Paul reverted the commits),
and "rel-rf-acq-po" is the relation that was introduced to replace it.

At any rate, it looks like instead of strengthening the relation, I
should write a patch that removes it entirely. I also will add new,
stronger relations for use with locking, essentially making spin_lock
and spin_unlock be RCsc.

Alan