On Wed, 27 Jun 2007, Nick Piggin wrote:
I don't know why my unlock sequence should be that much slower? Unlocked
mov vs unlocked add? Definitely in dumb micro-benchmark testing it wasn't
twice as slow (IIRC).
Oh, that releasing "add" can be unlocked, and only the holder of the lock ever touches that field?
I must not have looked closely enough. In that case, I withdraw that objection, and the sequence-number-based spinlock sounds like a perfectly fine one.
Yes, the add will be slightly slower than the plain byte move, and the locked xadd will be slightly slower than a regular locked add, but compared to the serialization cost, that should be small. For some reason I thought you needed a locked instruction for the unlock too.
So try it with just a byte counter, and test some stupid micro-benchmark on both a P4 and a Core 2 Duo, and if it's in the noise, maybe we can make it the normal spinlock sequence just because it isn't noticeably slower.
In fact, I think a "incb <mem>" instruction is even a byte shorter than "movb $1,mem", and with "unlock" being inlined, that could actually be a slight _win_.