Re: [RFC][PATCH 22/31] locking,tile: Implement atomic{,64}_fetch_{add,sub,and,or,xor}()
From: Peter Zijlstra
Date: Tue Apr 26 2016 - 11:29:38 EST
On Mon, Apr 25, 2016 at 04:54:34PM -0400, Chris Metcalf wrote:
> On 4/22/2016 5:04 AM, Peter Zijlstra wrote:
> > static inline int atomic_add_return(int i, atomic_t *v)
> > {
> > int val;
> > smp_mb(); /* barrier for proper semantics */
> > val = __insn_fetchadd4((void *)&v->counter, i) + i;
> > barrier(); /* the "+ i" above will wait on memory */
> >+ /* XXX smp_mb() instead, as per cmpxchg() ? */
> > return val;
> > }
>
> The existing code is subtle but I'm pretty sure it's not a bug.
>
> The tilegx architecture will take the "+ i" and generate an add instruction.
> The compiler barrier will make sure the add instruction happens before
> anything else that could touch memory, and the microarchitecture will make
> sure that the result of the atomic fetchadd has been returned to the core
> before any further instructions are issued. (The memory architecture is
> lazy, but when you feed a load through an arithmetic operation, we block
> issuing any further instructions until the add's operands are available.)
>
> This would not be an adequate memory barrier in general, since other loads
> or stores might still be in flight, even if the "val" operand had made it
> from memory to the core at this point. However, we have issued no other
> stores or loads since the previous memory barrier, so we know that there
> can be no other loads or stores in flight, and thus the compiler barrier
> plus arithmetic op is equivalent to a memory barrier here.
>
> In hindsight, perhaps a more substantial comment would have been helpful
> here. Unless you see something missing in my analysis, I'll plan to go
> ahead and add a suitable comment here :-)
>
> Otherwise, though just based on code inspection so far:
>
> Acked-by: Chris Metcalf <cmetcalf@xxxxxxxxxxxx> [for tile]
Thanks!
Just to verify; the new fetch-op thingies _do_ indeed need the extra
smp_mb() as per my patch, because there is no trailing instruction
depending on the completion of the load?