On Tue, 26 Jun 2007, Nick Piggin wrote:
Hmm, not that I have a strong opinion one way or the other, but I
don't know that they would encourage bad code. They are not going to
reduce latency under a locked section, but will improve determinism
in the contended case.
xadd really generally *is* slower than an add. One is often microcoded, the other is not.
But the real problem is that your "unlock" sequence is now about two orders of magnitude slower than it used to be. So it used to be that a spinlocked sequence only had a single synchronization point, now it has two. *That* is really bad, and I guarantee that it makes your spinlocks effectively twice as slow for the non-contended parts.
But your xadd thing might be worth looking at, just to see how expensive it is. As an _alternative_ to spinlocks, it's certainly viable.
(Side note: why make it a word? Word operations are slower on many x86 implementations, because they add yet another prefix. You only need a byte)