On Wed, Dec 06, 2006 at 11:16:55AM -0800, Christoph Lameter wrote:
No. If you read what I said, you'll see that you can _cheaply_ use
cmpxchg in a ll/sc based implementation. Take an atomic increment
operation.
do {
old = load_locked(addr);
} while (store_exclusive(old, old + 1, addr);
On a cmpxchg, that "store_exclusive" (loosely) becomes your cmpxchg
instruction, comparing the first arg, and if equal storing the second.
The "load_locked" macro becomes a standard pointer deref. Ergo, x86
becomes:
do {
load value
manipulate it
conditional store
} while not stored
On ll/sc, the load_locked() macro is the load locked instruction. The
store_exclusive() macro is the exclusive store and it doesn't need to
use the first parameter at all. Ergo, ARM becomes:
do {
ldrex r1, [r2]
manipulate r1
strex r0, r1, [r2]
} while failed
Notice that both are optimal.
Now let's consider the cmpxchg case.
do {
val = *addr;
} while (cmpxchg(val, val + 1, addr);
The x86 case is _identical_ to the ll/sc based implementation. Absolutely
entirely. No impact what so ever.
Let's look at the ll/sc case. The cmpxchg code implemented on this has
to reload the original value, compare it, if equal store the new value.
So:
do {
val = *addr;
(r2 = addr, ldrex r1, [r2]
compare r1, r0
strexeq r4, r3, [r2] (store exclusive if equal)
} while store failed or comparecondition failed
Note how the cmpxchg has _forced_ the ll/sc implementation to become
more complex.
So, let's recap.
Implementing ll/sc based accessor macros allows both ll/sc _and_ cmpxchg
architectures to produce optimal code.
Implementing an cmpxchg based accessor macro allows cmpxchg architectures
to produce optimal code and ll/sc non-optimal code.
See my point?