Re: [PATCH locking/Documentation 1/2] Add note of release-acquire store vulnerability

From: Paul E. McKenney
Date: Fri Sep 30 2016 - 08:14:18 EST


On Fri, Sep 30, 2016 at 11:57:38AM +0200, Peter Zijlstra wrote:
> On Thu, Sep 29, 2016 at 12:18:58PM -0700, Paul E. McKenney wrote:
> > On Thu, Sep 29, 2016 at 08:44:39PM +0200, Peter Zijlstra wrote:
>
> > > How about something like so on PPC?
> > >
> > > P0(int *x, int *y)
> > > {
> > > WRITE_ONCE(*x, 1);
> > > smp_store_release(y, 1);
> > > }
> > >
> > > P1(int *x, int *y)
> > > {
> > > WRITE_ONCE(x, 2);
> >
> > Need "WRITE_ONCE(*x, 2)" here.
> >
> > > smp_store_release(y, 2);
> > > }
> > >
> > > P2(int *x, int *y)
> > > {
> > > r1 = smp_load_acquire(y);
> > > r2 = READ_ONCE(*x);
> > > }
> > >
> > > (((x==1 && y==2) | (x==2 && y==1)) && (r1==1 || r1==2) && r2==0)
> >
> > That exists-clause is quite dazzling... So if each of P0 and P1
> > win, but on different stores, and if P2 follows one or the other
> > of P0 or P1, can r2 get the pre-initialization value for x?
> >
> > > If you execute P0 and P1 concurrently and one store of each 'wins' the
> > > LWSYNC of either is null and void, and therefore P2 is unordered and can
> > > observe r2==0.
> >
> > That vaguely resembles the infamous Z6.3, but only vaguely. The Linux-kernel
> > memory model says "forbidden" to this:
>
> https://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc710.html
>
> That one, right?

That is the one! Prohibiting the cycle requires smp_mb() on both threads
0 and 1 on the one hand, or on both threads 0 and 2 on the other hand.

> Hmm, I seem to remember something else.. /me goes poke through history
> and comes up with:
>
> https://lkml.kernel.org/r/20160115215853.GC3818@xxxxxxxxxxxxxxxxxx
>
> So what was that about then? I remember it being a completely
> nonsensical case, but a weird one.

That was an Alan Stern example where PowerPC prohibits the outcome,
and the ppcmem tool agrees, but where herd does not. (ppcmem is giving
the correct answer.)

But this is specific to PowerPC. I would not advise writing code that
relies on this one. ;-)

> > So let's try PPCMEM. If PPCMEM allows it, then the kernel model is
> > clearly broken.
> >
> > PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus
> > {
> > 0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y;
> > 1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y;
> > 2:r3=x; 2:r4=y;
> > }
> > P0 | P1 | P2 ;
> > stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ;
> > lwsync | lwsync | lwsync ;
> > stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ;
> > exists
> > (((x=1 /\ y=2) \/ (x=2 /\ y=1)) /\ (2:r1=1 \/ 2:r1=2) /\ 2:r2=0)
>
> > Or did I incorrectly translate your litmus test?
>
> Looks about right.
>
> Still not seeing how that is prohibited though. My reasoning is as
> follows:
>
> - P0 and P1 both store to x, one looses (say P0). Effectively only P1
> does a store.
>
> - P0 and P1 both store to y, one looses (say P1). Effectively only P0
> does a store.

PowerPC does not "obscure" stores, so both stores really are there and
the lwsync really has effect on all CPUs. From what I understand, even
CPUs that do obscure stores only do so in the case of repeated stores
by the same CPU to the same variable, and the above litmus test doesn't
have this.

So all the stores happen, and each CPU's stores are at least locally
ordered.

> - P2 reads y, sees the value from P0.

Fair enough!

> - P2 does lwsync, which constraints P2 to not issue the load of x
> before this. It also forms a (local) sync-point with P0 for having
> seen its store or y.
>
> - P2 reads x, sees the initial value because the store from P1 hasn't
> been propagated yet.
>
> It will not see the store P0 did to x, since that didn't happen.

Well, it saw the store to y, so it absolutely must see one of the other of
the stores to x in this particular litmus test. If P1's store overwrote
P0's store, then P2 has to see P1's store, for example.

> Assuming I'm wrong on that last part, is then the following possible?
>
> (x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)
>
> Where we see a store that didn't happen?

Again, both stores really did happen. ;-)

So with this litmus test:

PPC PeterZijlstra+o-r+o-r+a-o-SB.litmus
{
0:r1=1; 0:r2=2; 0:r3=x; 0:r4=y;
1:r1=1; 1:r2=2; 1:r3=x; 1:r4=y;
2:r3=x; 2:r4=y;
}
P0 | P1 | P2 ;
stw r1,0(r3) | stw r2,0(r3) | lwz r1,0(r4) ;
lwsync | lwsync | lwsync ;
stw r1,0(r4) | stw r2,0(r4) | lwz r2,0(r3) ;
exists
(x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)

The herd tool says:

Test PeterZijlstra+o-r+o-r+a-o-SB Allowed
States 24
2:r1=0; 2:r2=0; x=1; y=1;
2:r1=0; 2:r2=0; x=1; y=2;
2:r1=0; 2:r2=0; x=2; y=1;
2:r1=0; 2:r2=0; x=2; y=2;
2:r1=0; 2:r2=1; x=1; y=1;
2:r1=0; 2:r2=1; x=1; y=2;
2:r1=0; 2:r2=1; x=2; y=1;
2:r1=0; 2:r2=1; x=2; y=2;
2:r1=0; 2:r2=2; x=1; y=1;
2:r1=0; 2:r2=2; x=1; y=2;
2:r1=0; 2:r2=2; x=2; y=1;
2:r1=0; 2:r2=2; x=2; y=2;
2:r1=1; 2:r2=1; x=1; y=1;
2:r1=1; 2:r2=1; x=1; y=2;
2:r1=1; 2:r2=1; x=2; y=1;
2:r1=1; 2:r2=1; x=2; y=2;
2:r1=1; 2:r2=2; x=2; y=1;
2:r1=1; 2:r2=2; x=2; y=2;
2:r1=2; 2:r2=1; x=1; y=1;
2:r1=2; 2:r2=1; x=1; y=2;
2:r1=2; 2:r2=2; x=1; y=1;
2:r1=2; 2:r2=2; x=1; y=2;
2:r1=2; 2:r2=2; x=2; y=1;
2:r1=2; 2:r2=2; x=2; y=2;
Ok
Witnesses
Positive: 1 Negative: 23
Condition exists (x=2 /\ y=1 /\ 2:r1=1 /\ 2:r2=1)
Observation PeterZijlstra+o-r+o-r+a-o-SB Sometimes 1 23
Hash=c32afd1ac8bfee7d4b23a27e783d0998

So herd believes that it can happen. I was also able to force the web
ppcmem tool (https://www.cl.cam.ac.uk/~pes20/ppcmem/index.html) to get
into this state with the following sequence of choices:

[0;5;4;0;0;4;4;2;2;2;4;4;1;1;1;7;6;0;0;3;4;0;1;2;5;0;0]

So, yes, this can happen, architecturally at least.

Thanx, Paul