Re: spin_lock implicit/explicit memory barrier

From: Boqun Feng
Date: Thu Aug 11 2016 - 22:47:31 EST


On Wed, Aug 10, 2016 at 12:17:57PM -0700, Davidlohr Bueso wrote:
> On Wed, 10 Aug 2016, Manfred Spraul wrote:
>
> > On 08/10/2016 02:05 AM, Benjamin Herrenschmidt wrote:
> > > On Tue, 2016-08-09 at 20:52 +0200, Manfred Spraul wrote:
> > > > Hi Benjamin, Hi Michael,
> > > >
> > > > regarding commit 51d7d5205d33 ("powerpc: Add smp_mb() to
> > > > arch_spin_is_locked()"):
> > > >
> > > > For the ipc/sem code, I would like to replace the spin_is_locked() with
> > > > a smp_load_acquire(), see:
> > > >
> > > > http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n367
> > > >
> > > > http://www.ozlabs.org/~akpm/mmots/broken-out/ipc-semc-fix-complex_count-vs-simple-op-race.patch
> > > >
> > > > To my understanding, I must now add a smp_mb(), otherwise it would be
> > > > broken on PowerPC:
> > > >
> > > > The approach that the memory barrier is added into spin_is_locked()
> > > > doesn't work because the code doesn't use spin_is_locked().
> > > >
> > > > Correct?
> > > Right, otherwise you aren't properly ordered. The current powerpc locks provide
> > > good protection between what's inside vs. what's outside the lock but not vs.
> > > the lock *value* itself, so if, like you do in the sem code, use the lock
> > > value as something that is relevant in term of ordering, you probably need
> > > an explicit full barrier.
>
> But the problem here is with spin_unlock_wait() (for ll/sc spin_lock) not seeing the
> store that makes the lock visibly taken and both threads end up exiting out of sem_lock();
> similar scenario to the spin_is_locked commit mentioned above, which is crossing of
> locks.
>
> Now that spin_unlock_wait() always implies at least an load-acquire barrier (for both
> ticket and qspinlocks, which is still x86 only), we wait on the full critical region.
>
> So this patch takes this locking scheme:
>
> CPU0 CPU1
> spin_lock(l) spin_lock(L)
> spin_unlock_wait(L) if (spin_is_locked(l))
> foo() foo()
>
> ... and converts it now to:
>
> CPU0 CPU1
> complex_mode = true spin_lock(l)
> smp_mb() <--- do we want a smp_mb() here?
> spin_unlock_wait(l) if (!smp_load_acquire(complex_mode))
> foo() foo()
>
> We should not be doing an smp_mb() right after a spin_lock(), makes no sense. The
> spinlock machinery should guarantee us the barriers in the unorthodox locking cases,
> such as this.
>

Right.

If you have:

6262db7c088b ("powerpc/spinlock: Fix spin_unlock_wait()")

you don't need smp_mb() after spin_lock() on PPC.

And, IIUC, if you have:

3a5facd09da8 ("arm64: spinlock: fix spin_unlock_wait for LSE atomics")
d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against
concurrent lockers")

you don't need smp_mb() after spin_lock() on ARM64.

And, IIUC, if you have:

2c6100227116 ("locking/qspinlock: Fix spin_unlock_wait() some more")

you don't need smp_mb() after spin_lock() on x86 with qspinlock.

Regards,
Boqun

> Thanks,

> Davidlohr

Attachment: signature.asc
Description: PGP signature