Re: spin_lock implicit/explicit memory barrier
From: Paul E. McKenney
Date: Wed Aug 10 2016 - 16:52:11 EST
On Wed, Aug 10, 2016 at 08:21:22PM +0200, Manfred Spraul wrote:
> Hi,
>
> [adding Peter, correcting Davidlohr's mail address]
>
> On 08/10/2016 02:05 AM, Benjamin Herrenschmidt wrote:
> >On Tue, 2016-08-09 at 20:52 +0200, Manfred Spraul wrote:
> >>Hi Benjamin, Hi Michael,
> >>
> >>regarding commit 51d7d5205d33 ("powerpc: Add smp_mb() to
> >>arch_spin_is_locked()"):
> >>
> >>For the ipc/sem code, I would like to replace the spin_is_locked() with
> >>a smp_load_acquire(), see:
> >>
> >>http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n367
> >>
> >>http://www.ozlabs.org/~akpm/mmots/broken-out/ipc-semc-fix-complex_count-vs-simple-op-race.patch
> >>
> >>To my understanding, I must now add a smp_mb(), otherwise it would be
> >>broken on PowerPC:
> >>
> >>The approach that the memory barrier is added into spin_is_locked()
> >>doesn't work because the code doesn't use spin_is_locked().
> >>
> >>Correct?
> >Right, otherwise you aren't properly ordered. The current powerpc locks provide
> >good protection between what's inside vs. what's outside the lock but not vs.
> >the lock *value* itself, so if, like you do in the sem code, use the lock
> >value as something that is relevant in term of ordering, you probably need
> >an explicit full barrier.
> >
> >Adding Paul McKenney.
> Just to be safe, let's write down all barrier pairs:
> entry and exit, simple and complex, and switching simple to complex
> and vice versa.
>
> (@Davidlohr: Could you crosscheck, did I overlook a pair?)
>
> 1)
> spin_lock/spin_unlock pair.
If CPU A does spin_unlock(&l1) and CPU B later does spin_lock(&l1),
then CPU B will see all of CPU A's l1-protected accesses. In other
words, locks really do work like they need to. ;-)
However, some other CPU C -not- holding l1 might see once of CPU A's
writes as happening after one of CPU B's reads, but only if that write
and that read are to two different variables. You can prevent this sort
of misordering (as RCU does) by placing smp_mb__after_unlock_lock()
after CPU B's spin_lock(). But the need to prevent this misordering
appears to be rare.
> 2)
I am having some difficulty parsing this, but...
> ||smp_load_acquire(&sma->complex_mode) and
> |||smp_store_release(sma->complex_mode, true) pair.
> ||http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n374
> http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n321
> The store_release guarantees that all data written by the complex_op
> syscall is - after the load_acquire - visible by the simple_op
> syscall.
And smp_store_release() and smp_load_acquire() are quite similar to
spin_unlock() and spin_lock().
If CPU A does smp_store_release(&x) and CPU B does smp_load_acquire(&x),
and CPU B loads the value that CPU A stored to x, or some later value,
then any of CPU B accesses following its smp_load_acquire(&x) will see
all of CPU A's accesses preceding its smp_store_release(&x).
However, some other CPU C that never accessed x might see one of CPU A's
pre-release writes as happening after one of CPU B's post-acquire reads,
but only if that write and that read are to two different variables.
> 3) smp_mb() [after spin_lock()] and
> |||smp_store_mb(sma->complex_mode, true) pair.
> |http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n287
> http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n372
> This are actually two pairs: - Writing the lock variable must
> observed by the task that does spin_unlock_wait() - complex_mode
> must be observed by the task that does the smp_load_acquire()
So this is the smp_store_mb() and the following spin_unlock_wait()
on the one hand and the spin_lock(), the smp_mb(), and the
smp_load_acquire() on the other?
Let's see... If the smp_load_acquire() doesn't see value stored by
smp_store_mb(), then the spin_unlock_wait() is guaranteed to see
the fact that the other CPU holds the lock. The ->slock field is
volatile, so the compiler shouldn't be able to mess us up too badly.
The control dependency is a spinloop, so no ability for the compiler
to move stuff after the end of an "if" to before that "if", which
is good as well.
> 4)
> spin_unlock_wait() and spin_unlock() pair
> http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n291
> http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n409
> The data from the simple op must be observed by the following
> complex op. Right now, there is still an smp_rmb() in line 300: The
> control barrier from the loop inside spin_unlock_wait() is upgraded
> to an acquire barrier by an additional smp_rmb(). Is this smp_rmb()
> required? If I understand commit 2c6100227116 ("locking/qspinlock:
> Fix spin_unlock_wait() some more") right, with this commit qspinlock
> handle this case without the smp_rmb(). What I don't know if powerpc
> is using qspinlock already, or if powerpc works without the
> smp_rmb(). -- Manfred|
And I was taking this into account as well. I believe that this does
what you want it to, and checked it against the current prototype
Linux-kernel memory model with the litmus test shown below, and the
memory model agreed with my assessment. Which might or might not
be worth anything. ;-)
Thanx, Paul
------------------------------------------------------------------------
C C-ManfredSpraul-Sem
(*
* Result: Never
*)
{
}
P0(int *lck, int *complex_mode, int *x, int *y)
{
WRITE_ONCE(*lck, 1); /* only one lock acquirer, so can cheat. */
smp_mb();
r1 = smp_load_acquire(complex_mode);
if (r1 == 0) {
r2 = READ_ONCE(*x);
WRITE_ONCE(*y, 1);
}
smp_store_release(lck, 0);
}
P1(int *lck, int *complex_mode, int *x, int *y)
{
WRITE_ONCE(*complex_mode, 1);
smp_mb();
r3 = READ_ONCE(*lck);
if (r3 == 0) {
smp_rmb();
WRITE_ONCE(*x, 1);
r4 = READ_ONCE(*y);
}
}
exists
(0:r1=0 /\ 1:r3=0 /\ (0:r2=1 /\ 1:r4=0))