Re: Broken ARM atomic ops wrt memory barriers (was : [PATCH] Addcmpxchg support for ARMv6+ systems)
From: Mathieu Desnoyers
Date: Mon May 25 2009 - 16:23:47 EST
* Russell King - ARM Linux (linux@xxxxxxxxxxxxxxxx) wrote:
> This reply is based upon what's in your email rather than atomic_ops.
> On Mon, May 25, 2009 at 01:29:55PM -0400, Mathieu Desnoyers wrote:
> > This is a very good start, but I think a few are still missing :
> > in atomic.h :
> > /* Atomic operations are already serializing on ARM */
> > #define smp_mb__before_atomic_dec() barrier()
> > #define smp_mb__after_atomic_dec() barrier()
> > #define smp_mb__before_atomic_inc() barrier()
> > #define smp_mb__after_atomic_inc() barrier()
> > should probably map to smp_mb() for arm v6+.
> BTW, I think you're wrong here. atomic_dec() and atomic_inc() are
> implemented using atomic_add_return() and atomic_sub_return(). Both
> of these functions are serializing as a result of the patch you
> replied to.
Hrm, then atomic add/dec should not be implemented on top of the add/sub
return with memory barriers, given this would kill performance for no
reason. It would be better to re-implement mb-less add/dec and add those
But you are right : it's not a bug, just... very slow.
> > Also, bitops.h should have : (taken from powerpc)
> > /*
> > * clear_bit doesn't imply a memory barrier
> > */
> > #define smp_mb__before_clear_bit() smp_mb()
> > #define smp_mb__after_clear_bit() smp_mb()
> Again, disagree. With the current definition being mb(), they become
> - a compiler barrier on UP architectures (which don't have weak ordering
> - a data memory barrier on UP coherent xscale (don't know if this has
> weak ordering)
> - a data memory barrier on SMP
> So, I think no change is required; mb() is doing at least the right thing.
> (Whether it's heavier than it actually needs to be is another question,
> and that only affects the coherent xscale stuff. That is out of my
> knowledge to answer.)
Right, no bug here, only probably much slower.
To give you an order of magnitude, a cmpxchg_local primitive without the
memory barriers takes 11 cycles on my ARMv7 omap3. cmpxchg with mb()
before and after the cmpxchg takes 71 cycles. That's 6.45 times slower.
> > According to atomic_ops.txt, 3 other bitwise atomic ops imply memory
> > barriers :
> > "There are two special bitops with lock barrier semantics (acquire/release,
> > same as spinlocks). These operate in the same way as their non-_lock/unlock
> > postfixed variants, except that they are to provide acquire/release semantics,
> > respectively. This means they can be used for bit_spin_trylock and
> > bit_spin_unlock type operations without specifying any more barriers.
> > int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);
> > void clear_bit_unlock(unsigned long nr, unsigned long *addr);
> > void __clear_bit_unlock(unsigned long nr, unsigned long *addr);
> > The __clear_bit_unlock version is non-atomic, however it still implements
> > unlock barrier semantics. This can be useful if the lock itself is protecting
> > the other bits in the word."
> It looks to me that if we make arch/arm/lib/bitops.h fully ordered then
> these get sorted out for free.
Yes, this has been my first thought too. I think all these
implementations use bitops.h.
> > arch/arm/include/asm/mutex.h should also have smp_mb() to provide
> > acquire/release semantic to mutex fastpath (like spinlock does),
> > otherwise subtle deadlocks and various problems could occur.
> Hmm, the mutex is undocumented in the atomic ops document. Does it
> require ordering both before and after, or do some of those ops just
> need it before acquire and after release?
I guess the mutex fast path should probably be added to atomic_ops.txt.
If I look at PowerPC mutex.h, mutex lock provides acquire semantic (like
spinlock) and mutex unlock provides release semantic (like spin unlock).
(critical section memory accesses)
(critical section memory accesses)
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/