Re: [RFC][PATCH 0/5] arch: atomic rework

From: Paul E. McKenney
Date: Tue Feb 18 2014 - 14:47:59 EST


On Tue, Feb 18, 2014 at 10:49:27AM -0800, Linus Torvalds wrote:
> On Tue, Feb 18, 2014 at 10:21 AM, Peter Sewell
> <Peter.Sewell@xxxxxxxxxxxx> wrote:
> >
> > This is a bit more subtle, because (on ARM and POWER) removing the
> > dependency and conditional branch is actually in general *not* equivalent
> > in the hardware, in a concurrent context.
>
> So I agree, but I think that's a generic issue with non-local memory
> ordering, and is not at all specific to the optimization wrt that
> "x?42:42" expression.
>
> If you have a value that you loaded with a non-relaxed load, and you
> pass that value off to a non-local function that you don't know what
> it does, in my opinion that implies that the compiler had better add
> the necessary serialization to say "whatever that other function does,
> we guarantee the semantics of the load".
>
> So on ppc, if you do a load with "consume" or "acquire" and then call
> another function without having had something in the caller that
> serializes the load, you'd better add the lwsync or whatever before
> the call. Exactly because the function call itself otherwise basically
> breaks the visibility into ordering. You've basically turned a
> load-with-ordering-guarantees into just an integer that you passed off
> to something that doesn't know about the ordering guarantees - and you
> need that "lwsync" in order to still guarantee the ordering.
>
> Tough titties. That's what a CPU with weak memory ordering semantics
> gets in order to have sufficient memory ordering.

And that is in fact what C11 compilers are supposed to do if the function
doesn't have the [[carries_dependency]] attribute on the corresponding
argument or return of the non-local function. If the function is marked
with [[carries_dependency]], then the compiler has the information needed
in both compilations to make things work correctly.

Thanx, Paul

> And I don't think it's actually a problem in practice. If you are
> doing loads with ordered semantics, you're not going to pass the
> result off willy-nilly to random functions (or you really *do* require
> the ordering, because the load that did the "acquire" was actually for
> a lock!
>
> So I really think that the "local optimization" is correct regardless.
>
> Linus
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/