Re: Broken ARM atomic ops wrt memory barriers (was : [PATCH] Addcmpxchg support for ARMv6+ systems)

From: Mathieu Desnoyers
Date: Mon May 25 2009 - 11:17:44 EST

* Jamie Lokier (jamie@xxxxxxxxxxxxx) wrote:
> Mathieu Desnoyers wrote:
> > I use a local cmpxchg in the LTTng tree as key instruction to manage the
> > ring buffer in a irq, softirq and NMI-safe way (due to the atomic nature
> > of this instruction), but without the overhead of synchronizing across
> > CPUs.
> >
> > On ARM, the semantic looks a bit like PowerPC with linked load/linked
> > store, and you don't seem to need memory barriers. I guess that's either
> > because
> >
> > a) they are implicit in the ll/ls or
> > b) ARM does not perform out-of-order memory read/writes or
> > c) they've simply been forgotten, and it's a bug.
> >
> > the cmpxchg local instruction maps currently to cmpxchg, as a fallback,
> > since there is no difference between the SMP-aware and UP-only
> > instructions.
> >
> > But if I look at arch/arm/include/asm/spinlock.h, the answer I get seems
> > to be c) : ARM needs memory barriers.
> Memory barriers only affect the observed access order with respect to
> other processors (and perhaps other devices).
> So a CPU-local operation would not need barriers. CPU-local code,
> including local IRQs, see all memory accesses in the same order as
> executed instructions.
> Of course you need barriers at some point, when using the local data
> to update global data seen by other CPUs at a later time. But that is
> hopefully done elsewhere.
> After considering this, do you think it's still missing barriers?

I realize I have not made myself clear, I apologise :

- as you say, cmpxchg_local, xchg_local, local atomic add return do not
need any memory barrier whatsoever, given they are cpu-local.

- cmpxchg, xchg and atomic add return need memory barriers on
architectures which can reorder the relative order in which memory
read/writes can be seen between CPUs, which seems to include recent
ARM architectures. Those barriers are currently missing on ARM.

Therefore, the standard ARM atomic operations would need to be fixed so
they provide the memory barriers semantic implied in
Documentation/atomic_ops.txt. The current ARM atomic ops, which lack the
proper memory barriers (and are thus only correct on UP) can then be
used as optimized local ops.

Hopefully this is a bit clearer,



> -- Jamie

Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at