Re: [RFC PATCH v2] memory-barriers: remove smp_mb__after_unlock_lock()

From: Paul Mackerras
Date: Mon Aug 31 2015 - 22:57:45 EST


On Thu, Jul 16, 2015 at 03:14:55PM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2015-07-16 at 15:03 +1000, Benjamin Herrenschmidt wrote:
> > On Thu, 2015-07-16 at 12:00 +1000, Michael Ellerman wrote:
> > > That would fix the problem with smp_mb__after_unlock_lock(), but not
> > > the original worry we had about loads happening before the SC in lock.
> >
> > However I think isync fixes *that* :-) The problem with isync is as you
> > said, it's not a -memory- barrier per-se, it's an execution barrier /
> > context synchronizing instruction. The combination stwcx. + bne + isync
> > however prevents the execution of anything past the isync until the
> > stwcx has completed and the bne has been "decided", which prevents loads
> > from leaking into the LL/SC loop. It will also prevent a store in the
> > lock from being issued before the stwcx. has completed. It does *not*
> > prevent as far as I can tell another unrelated store before the lock
> > from leaking into the lock, including the one used to unlock a different
> > lock.
>
> Except that the architecture says:
>
> <<
> Because a Store Conditional instruction may com-
> plete before its store has been performed, a condi-
> tional Branch instruction that depends on the CR0
> value set by a Store Conditional instruction does
> not order the Store Conditional's store with respect
> to storage accesses caused by instructions that
> follow the Branch
> >>
>
> So isync in lock in architecturally incorrect, despite being what the
> architecture recommends using, yay !

That paragraph doesn't say anything about isync. Doing stcx., bne,
isync to acquire a lock will ensure that the instructions inside the
locked region can't execute until the lock is globally known to be
acquired. It is true that another CPU that doesn't take the lock
could see the effect of some load or store inside the locked region
and then see the lock variable itself being clear; but any CPU that
tried to acquire the lock would have its stcx. fail (at least until
the first CPU releases the lock).

So isync in the lock sequence is architecturally correct.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/