Re: [RFC][PATCH] mips: Fix arch_spin_unlock()

From: Will Deacon
Date: Thu Jan 28 2016 - 04:57:28 EST


On Wed, Jan 27, 2016 at 03:38:36PM -0800, Paul E. McKenney wrote:
> On Wed, Jan 27, 2016 at 03:21:58PM +0000, Will Deacon wrote:
> > On Wed, Jan 27, 2016 at 03:54:21PM +0100, Peter Zijlstra wrote:
> > > On Wed, Jan 27, 2016 at 11:43:48AM +0000, Will Deacon wrote:
> > > > Do you know whether a SYNC 18 (RELEASE) followed in program order by a
> > > > SYNC 17 (ACQUIRE) creates a full barrier (i.e. something like SYNC 16)?
> > > >
> > > > If not, you may need to implement smp_mb__after_unlock_lock for RCU
> > > > to ensure globally transitive unlock->lock ordering should you decide
> > > > to relax your locking barriers.
> > >
> > > You know that is a tricky question. Maybe its easier if you give the 3
> > > cpu litmus test that goes with it.
> >
> > Sure, I was building up to that. I just wanted to make sure the basics
> > were there (program-order, so same CPU) before we go any further. It
> > sounds like they are, so that's promising.
> >
> > > Maciej, the tricky point is what, if any, effect the
> > > SYNC_RELEASE+SYNC_ACQUIRE pair has on an unrelated CPU. Please review
> > > the TRANSITIVITY section in Documentation/memory-barriers.txt and
> > > replace <general barrier> with the RELEASE+ACQUIRE pair.
> > >
> > > We've all (Will, Paul and me) had much 'fun' trying to decipher the
> > > MIPS64r6 manual but failed to reach a conclusion on this.
> >
> > For the inter-thread case, Paul had a previous example along the lines
> > of:
> >
> >
> > Wx=1
> > WyRel=1
> >
> > RyAcq=1
> > Rz=0
> >
> > Wz=1
> > smp_mb()
> > Rx=0
>
> Each paragraph being a separate thread, correct? If so, agreed.

Yes, sorry for the shorthand:

- Each paragraph is a separate thread
- Wx=1 means WRITE_ONCE(x, 1), Rx=1 means READ_ONCE(x) returns 1
- WxRel means smp_store_release(x,1), RxAcq=1 means smp_load_acquire(x)
returns 1
- Everything is initially zero

> > and I suppose a variant of that:
> >
> >
> > Wx=1
> > WyRel=1
> >
> > RyAcq=1
> > Wz=1
> >
> > Rz=1
> > <address dependency>
> > Rx=0
>
> Agreed, this would be needed as well, along with the read-read and
> read-write variants. I picked the write-read version (Will's first
> test above) because write-read reordering is the most likely on
> hardware that I am aware of.

Question: if you replaced "Wz=1" with "WzRel=1" in my second test, would
it then be forbidden?

Will