Anyway, here is some disassembly of some of the code generated with my comments:
c00000000049bf9c <.mutex_lock>:
c00000000049bf9c: 7c 00 06 ac eieio
c00000000049bfa0: 7d 20 18 28 lwarx r9,r0,r3
c00000000049bfa4: 31 29 ff ff addic r9,r9,-1
The eieio is completly unnecessary, it got picked up from atomic_dec_return (Anton, why is there an eieio at the start of atomic_dec_return in the first place?).
a mutex is like a spinlock, it must prevent loads and stores within the critical section from 'leaking outside the critical section' [they must not be reordered to before the mutex_lock(), nor to after the mutex_unlock()] - hence the barriers added by atomic_dec_return() are very much needed.