Re: futex atomic vs ordering constraints

From: Chris Metcalf
Date: Wed Sep 02 2015 - 13:25:57 EST

On 09/02/2015 01:00 PM, Peter Zijlstra wrote:
On Wed, Sep 02, 2015 at 12:10:58PM -0400, Chris Metcalf wrote:
On 09/02/2015 08:55 AM, Peter Zijlstra wrote:
So here goes..

Chris, I'm awfully sorry, but I seem to be Tile challenged.

TileGX seems to define:

#define smp_mb__before_atomic() smp_mb()
#define smp_mb__after_atomic() smp_mb()

However, its atomic_add_return() implementation looks like:

static inline int atomic_add_return(int i, atomic_t *v)
int val;
smp_mb(); /* barrier for proper semantics */
val = __insn_fetchadd4((void *)&v->counter, i) + i;
barrier(); /* the "+ i" above will wait on memory */
return val;

Which leaves me confused on smp_mb__after_atomic().
Are you concerned about whether it has proper memory
barrier semantics already, i.e. full barriers before and after?
In fact we do have a full barrier before, but then because of the
"+ i" / "barrier()", we know that the only other operation since
the previous mb(), namely the read of v->counter, has
completed after the atomic operation. As a result we can
omit explicitly having a second barrier.

It does seem like all the current memory-order semantics are
correct, unless I'm missing something!
So I'm reading that code like:

[RmW] ret = *val += i

So what is stopping later memory ops like:

[R] a = *foo
[S] *bar = b

From getting reordered with the RmW, like:


[R] a = *foo
[S] *bar = b

[RmW] ret = *val += i

Are you saying Tile does not reorder things like that? If so, why then
is smp_mb__after_atomic() a full mb(). If it does, I don't see how your
add_return is correct.

Alternatively I'm just confused..

Tile does not do out-of-order instruction issue, but it does have an
out-of-order memory subsystem, so in addition to stores becoming
unpredictably visible without a memory barrier, loads will also
potentially not read from memory predictably after issue.
As a result, later operations that use a register that was previously
loaded may stall instruction issue until the load value is available.
A memory fence instruction will cause the core to wait for all
stores to become visible and all load values to be available.

So [R] can't move up to before [RmW] due to the in-order issue
nature of the processor. And smp_mb__after_atomic() has to
be a full mb() because that's the only barrier we have available
to guarantee that the load has read from memory. (If the
value of the actual atomic was passed to smp_mb__after_atomic()
then we could just generate a fake use of the value, basically
generating something like "move r1, r1", which would cause
the instruction issue to halt until the value had been read.)

Chris Metcalf, EZChip Semiconductor

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at