Re: [PATCH v3 3/5] MCS Lock: Barrier corrections

From: Michel Lespinasse
Date: Thu Nov 07 2013 - 04:55:49 EST


On Wed, Nov 6, 2013 at 5:39 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> Sorry about the HTML crap, the internet connection is too slow for my normal
> email habits, so I'm using my phone.
>
> I think the barriers are still totally wrong for the locking functions.
>
> Adding an smp_rmb after waiting for the lock is pure BS. Writes in the
> locked region could percolate out of the locked region.
>
> The thing is, you cannot do the memory ordering for locks in any same
> generic way. Not using our current barrier system. On x86 (and many others)
> the smp_rmb will work fine, because writes are never moved earlier. But on
> other architectures you really need an acquire to get a lock efficiently. No
> separate barriers. An acquire needs to be on the instruction that does the
> lock.
>
> Same goes for unlock. On x86 any store is a fine unlock, but on other
> architectures you need a store with a release marker.
>
> So no amount of barriers will ever do this correctly. Sure, you can add full
> memory barriers and it will be "correct" but it will be unbearably slow, and
> add totally unnecessary serialization. So *correct* locking will require
> architecture support.

Rather than writing arch-specific locking code, would you agree to
introduce acquire and release memory operations ?

The semantics of an acquire memory operation would be: the specified
memory operation occurs, and any reads or writes after that operation
are guaranteed not to be reordered before it (useful to implement lock
acquisitions).
The semantics of a release memory operation would be: the specified
memory operation occurs, and any reads or writes before that operation
are guaranteed not to be reordered after it (useful to implement lock
releases).

Now each arch would still need to define several acquire and release
operations, but this is a quite useful model to build generic code on.
For example, the fast path for the x86 spinlock implementation could
be expressed generically as an acquire fetch-and-add (for
__ticket_spin_lock) and a release add (for __ticket_spin_unlock).

Would you think this is a useful direction to move to ?

Thanks,

--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/