Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel
Date: Wed Feb 12 2014 - 00:14:39 EST


On Sun, 2014-02-09 at 19:51 -0800, Paul E. McKenney wrote:
> On Mon, Feb 10, 2014 at 01:06:48AM +0100, Torvald Riegel wrote:
> > On Thu, 2014-02-06 at 20:20 -0800, Paul E. McKenney wrote:
> > > On Fri, Feb 07, 2014 at 12:44:48AM +0100, Torvald Riegel wrote:
> > > > On Thu, 2014-02-06 at 14:11 -0800, Paul E. McKenney wrote:
> > > > > On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote:
> > > > > > On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote:
> > > > > > > On Thu, Feb 06, 2014 at 06:59:10PM +0000, Will Deacon wrote:
> > > > > > > > There are also so many ways to blow your head off it's untrue. For example,
> > > > > > > > cmpxchg takes a separate memory model parameter for failure and success, but
> > > > > > > > then there are restrictions on the sets you can use for each. It's not hard
> > > > > > > > to find well-known memory-ordering experts shouting "Just use
> > > > > > > > memory_model_seq_cst for everything, it's too hard otherwise". Then there's
> > > > > > > > the fun of load-consume vs load-acquire (arm64 GCC completely ignores consume
> > > > > > > > atm and optimises all of the data dependencies away) as well as the definition
> > > > > > > > of "data races", which seem to be used as an excuse to miscompile a program
> > > > > > > > at the earliest opportunity.
> > > > > > >
> > > > > > > Trust me, rcu_dereference() is not going to be defined in terms of
> > > > > > > memory_order_consume until the compilers implement it both correctly and
> > > > > > > efficiently. They are not there yet, and there is currently no shortage
> > > > > > > of compiler writers who would prefer to ignore memory_order_consume.
> > > > > >
> > > > > > Do you have any input on
> > > > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448? In particular, the
> > > > > > language standard's definition of dependencies?
> > > > >
> > > > > Let's see... 1.10p9 says that a dependency must be carried unless:
> > > > >
> > > > > â B is an invocation of any specialization of std::kill_dependency (29.3), or
> > > > > â A is the left operand of a built-in logical AND (&&, see 5.14) or logical OR (||, see 5.15) operator,
> > > > > or
> > > > > â A is the left operand of a conditional (?:, see 5.16) operator, or
> > > > > â A is the left operand of the built-in comma (,) operator (5.18);
> > > > >
> > > > > So the use of "flag" before the "?" is ignored. But the "flag - flag"
> > > > > after the "?" will carry a dependency, so the code fragment in 59448
> > > > > needs to do the ordering rather than just optimizing "flag - flag" out
> > > > > of existence. One way to do that on both ARM and Power is to actually
> > > > > emit code for "flag - flag", but there are a number of other ways to
> > > > > make that work.
> > > >
> > > > And that's what would concern me, considering that these requirements
> > > > seem to be able to creep out easily. Also, whereas the other atomics
> > > > just constrain compilers wrt. reordering across atomic accesses or
> > > > changes to the atomic accesses themselves, the dependencies are new
> > > > requirements on pieces of otherwise non-synchronizing code. The latter
> > > > seems far more involved to me.
> > >
> > > Well, the wording of 1.10p9 is pretty explicit on this point.
> > > There are only a few exceptions to the rule that dependencies from
> > > memory_order_consume loads must be tracked. And to your point about
> > > requirements being placed on pieces of otherwise non-synchronizing code,
> > > we already have that with plain old load acquire and store release --
> > > both of these put ordering constraints that affect the surrounding
> > > non-synchronizing code.
> >
> > I think there's a significant difference. With acquire/release or more
> > general memory orders, it's true that we can't order _across_ the atomic
> > access. However, we can reorder and optimize without additional
> > constraints if we do not reorder. This is not the case with consume
> > memory order, as the (p + flag - flag) example shows.
>
> Agreed, memory_order_consume does introduce additional restrictions.
>
> > > This issue got a lot of discussion, and the compromise is that
> > > dependencies cannot leak into or out of functions unless the relevant
> > > parameters or return values are annotated with [[carries_dependency]].
> > > This means that the compiler can see all the places where dependencies
> > > must be tracked. This is described in 7.6.4.
> >
> > I wasn't aware of 7.6.4 (but it isn't referred to as an additional
> > constraint--what it is--in 1.10, so I guess at least that should be
> > fixed).
> > Also, AFAIU, 7.6.4p3 is wrong in that the attribute does make a semantic
> > difference, at least if one is assuming that normal optimization of
> > sequential code is the default, and that maintaining things such as
> > (flag-flag) is not; if optimizing away (flag-flag) would require the
> > insertion of fences unless there is the carries_dependency attribute,
> > then this would be bad I think.
>
> No, the attribute does not make a semantic difference. If a dependency
> flows into a function without [[carries_dependency]], the implementation
> is within its right to emit an acquire barrier or similar.

So you can't just ignore the attribute when generating code -- you would
at the very least have to do this consistently across all the compilers
compiling parts of your code. Which tells me that it does make a
semantic difference. But I know that what should or should not be an
attribute is controversial.

> > IMHO, the dependencies construct (carries_dependency, kill_dependency)
> > seem to be backwards to me. They assume that the compiler preserves all
> > those dependencies including (flag-flag) by default, which prohibits
> > meaningful optimizations. Instead, I guess there should be a construct
> > for explicitly exploiting the dependencies (e.g., a
> > preserve_dependencies call, whose argument will not be optimized fully).
> > Exploiting dependencies will be special code and isn't the common case,
> > so it can be require additional annotations.
>
> If you are compiling a function that has no [[carries_dependency]]
> attributes on it arguments and return value, and none on any of the
> functions that it calls, and contains no memomry_order_consume loads,
> then you can break dependencies all you like within that function.
>
> That said, I am of course open to discussing alternative approaches.
> Especially those that ease the migration of the existing code in the
> Linux kernel that rely on dependencies. ;-)
>
> > > If a dependency chain
> > > headed by a memory_order_consume load goes into or out of a function
> > > without the aid of the [[carries_dependency]] attribute, the compiler
> > > needs to do something else to enforce ordering, e.g., emit a memory
> > > barrier.
> >
> > I agree that this is a way to see it. But I can't see how this will
> > motivate compiler implementers to not just emit a stronger barrier right
> > away.
>
> That certainly has been the most common approach.
>
> > > From a Linux-kernel viewpoint, this is a bit ugly, as it requires
> > > annotations and use of kill_dependency, but it was the best I could do
> > > at the time. If things go as they usually do, there will be some other
> > > reason why those are needed...
> >
> > Did you consider something along the "preserve_dependencies" call? If
> > so, why did you go for kill_dependency?
>
> Could you please give more detail on what a "preserve_dependencies" call
> would do and where it would be used?

Demarcating regions of code (or particular expressions) that require the
preservation of source-code-level dependencies (eg, "p + flag - flag"),
and which thus constraints optimizations allowed on normal code.

What we have right now is a "blacklist" of things that kill
dependencies, which still requires compilers to not touch things like
"flag - flag". Doing so isn't useful in most code, so it would be
better to have a "whitelist" of things in which dependencies are
strictly preserved. The list of relevant expressions found in the
current RCU uses might be one such list. Another would be to require
programmers to explicitly annotate the expressions where those
dependencies should be specially preserved, with something like a
"preserve_dependencies(p + flag - flag)" call.

I agree that this might be harder when dependency tracking is scattered
throughout a larger region of code, as you pointed out today. But I'm
just looking for a setting that is more likely to see good support by
compilers.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/