Re: spin_lock implicit/explicit memory barrier

From: Davidlohr Bueso
Date: Wed Aug 10 2016 - 19:31:18 EST


On Wed, 10 Aug 2016, Paul E. McKenney wrote:

On Wed, Aug 10, 2016 at 03:23:16PM -0700, Davidlohr Bueso wrote:
On Wed, 10 Aug 2016, Paul E. McKenney wrote:

>On Wed, Aug 10, 2016 at 08:21:22PM +0200, Manfred Spraul wrote:

>> 4)
>>spin_unlock_wait() and spin_unlock() pair
>>http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n291
>>http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n409
>>The data from the simple op must be observed by the following
>>complex op. Right now, there is still an smp_rmb() in line 300: The
>>control barrier from the loop inside spin_unlock_wait() is upgraded
>>to an acquire barrier by an additional smp_rmb(). Is this smp_rmb()
>>required? If I understand commit 2c6100227116 ("locking/qspinlock:
>>Fix spin_unlock_wait() some more") right, with this commit qspinlock
>>handle this case without the smp_rmb(). What I don't know if powerpc
>>is using qspinlock already, or if powerpc works without the
>>smp_rmb(). -- Manfred|

No, ppc doesn't use qspinlocks, but as mentioned, spin_unlock_wait for
tickets are now at least an acquire (ppc is stronger), which match that
unlock store-release you are concerned about, this is as of 726328d92a4
(locking/spinlock, arch: Update and fix spin_unlock_wait() implementations).

This is exactly what you are doing by upgrading the ctrl dependency
to the acquire barrier in
http://git.cmpxchg.org/cgit.cgi/linux-mmots.git/tree/ipc/sem.c#n291
and therefore we don't need it explicitly -- it also makes the comment
wrt spin_unlock_wait obsolete. Or am I'm misunderstanding you?

Ah, I was looking at 4.7 rather than current mainline. Perhaps Manfred
was doing the same.

Right, and therefore backporting gets icky as any versions < 4.8 will
require this explicit smp_rmb :-( Given that the this complex vs simple
ops race goes way back to 3.12, I see these options:

(1) As Manfred suggested, have a patch 1 that fixes the race against mainline
with the redundant smp_rmb, then apply a second patch that gets rid of it
for mainline, but only backport the original patch 1 down to 3.12.

(2) Backport 726328d92a4 all the way down to 3.12.

(3) Have two patches, one for upstream and one for backporting (not sure how
that would fly though).

I'm in favor of (1) as it seems the least error prone, but long as we do get
rid of the redundant barrier. For the case of any smp_mb__after_unlock_lock
calls we end up needing for ppc, this would probably need backporting as is
afaict.

Thanks,
Davidlohr