Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel
Date: Thu Feb 20 2014 - 12:16:39 EST


On Wed, 2014-02-19 at 16:53 -0800, Linus Torvalds wrote:
> On Tue, Feb 18, 2014 at 11:47 AM, Torvald Riegel <triegel@xxxxxxxxxx> wrote:
> > On Tue, 2014-02-18 at 09:44 -0800, Linus Torvalds wrote:
> >>
> >> Can you point to it? Because I can find a draft standard, and it sure
> >> as hell does *not* contain any clarity of the model. It has a *lot* of
> >> verbiage, but it's pretty much impossible to actually understand, even
> >> for somebody who really understands memory ordering.
> >
> > http://www.cl.cam.ac.uk/~mjb220/n3132.pdf
> > This has an explanation of the model up front, and then the detailed
> > formulae in Section 6. This is from 2010, and there might have been
> > smaller changes since then, but I'm not aware of any bigger ones.
>
> Ahh, this is different from what others pointed at. Same people,
> similar name, but not the same paper.
>
> I will read this version too, but from reading the other one and the
> standard in parallel and trying to make sense of it, it seems that I
> may have originally misunderstood part of the whole control dependency
> chain.
>
> The fact that the left side of "? :", "&&" and "||" breaks data
> dependencies made me originally think that the standard tried very
> hard to break any control dependencies.

Hmm... (I'm not quite sure how to express this properly, so bear with
me.)

The control dependencies as you seem to consider aren't there *as is*,
and removing "? :", "&&", and "||" from what's called
Carries-a-dependency-to in n3132 is basically removing control
dependences in expressions from that relation (ie, so it's just data
dependences).

However, control dependences in the program are not ignored, because
they control what steps the abstract machine would do. If we have
candidate executions with a particular choice for the reads-from
relation, they must make sense in terms of program logic (which does
include conditionals and such), or a candidate execution will not be
consistent. The difference is in how reads-from is constrainted by
different memory orders.

I hope what I mean gets clearer below when looking at the example...

Mark Batty / Peter Sewell: Do you have a better explanation? (And
please correct anything wrong in what I wrote ;)

> Which I felt was insane, when
> then some of the examples literally were about the testing of the
> value of an atomic read. The data dependency matters quite a bit. The
> fact that the other "Mathematical" paper then very much talked about
> consume only in the sense of following a pointer made me think so even
> more.
>
> But reading it some more, I now think that the whole "data dependency"
> logic (which is where the special left-hand side rule of the ternary
> and logical operators come in) are basically an exception to the rule
> that sequence points end up being also meaningful for ordering (ok, so
> C11 seems to have renamed "sequence points" to "sequenced before").

I'm not quite sure what you mean, but I think that this isn't quite the
case. mo_consume and mo_acquire combine with sequenced-before
differently. Have a look at the definition of
inter-thread-happens-before (6.15 in n3132): for synchronizes-with, the
composition with sequenced-before is used (ie, what you'd get with
mo_acquire), whereas for mo_consume, we just have
dependency_ordered_before (so, data dependencies only).

> So while an expression like
>
> atomic_read(p, consume) ? a : b;
>
> doesn't have a data dependency from the atomic read that forces
> serialization, writing
>
> if (atomic_read(p, consume))
> a;
> else
> b;
>
> the standard *does* imply that the atomic read is "happens-before" wrt
> "a", and I'm hoping that there is no question that the control
> dependency still acts as an ordering point.

I don't think it does for mo_consume (but does for mo_acquire). Both
examples do basically the same in the abstract machine, and I think it
would be surprising if rewriting could the code with something seemingly
equivalent would make a semantic difference. AFAICT, "? :" etc. is
removed to allow the standard defining data dependencies as being based
on expressions that take the mo_consume value as input, without also
pulling in control dependencies.

To look at this example in the cppmem tool, we can transform your
example above into:

int main() {
atomic_int x = 0;
int y = 0;
{{{ { y = 1; x.store(1, memory_order_release); }
||| { if (x.load(memory_order_consume)) r2 = y; }
}}};
return 0; }

This has 2 threads, one is doing the store to y (the data) and the
release operation, the other is using consume to try to read the data.

This yields 2 consistent executions (we can read 2 values of x), but the
execution in which we read y has a data race.

I'm not quite sure how the tool parses/models data dependencies in
expressions, but the program (ie, your example) doesn't carry a
dependency from the mo_consume load to the load from y (as defined in C
++ 1.10p9 and n3132 6.13). Thus, the mo_release store is
dependency-ordered-before the mo_consume load ("dob" in the tool's
output graph), but that doesn't extend to the load from y. Therefore,
the load is unordered wrt. to the first thread's "y = 1", and because
they are nonatomic, we get a data race (labeled "dr"). (The labels on
the edges are easier to see in the "dot" Display Layout, even though one
has to then figure out which vertex belongs to which thread.)

If you enable it with the display options, the tool will show a control
dependency, but that is currently unused it seems (see 6.20 in n3132).

If we change mo_consume to mo_acquire in the example, the race goes
away. (This is visible in the inter-thread-happens-before relation,
which can be shown by "removing suppress_edge ithb." under "edit display
options" (but this makes the graph kind of messy)).

BTW, a pretty similar way to expressing this cppmem code above is using
readsvalue() to constrain executions:

int main() {
atomic_int x = 0;
int y = 0;
{{{ { y = 1; x.store(1, memory_order_release); }
||| { r1 = x.load(memory_order_consume).readsvalue(1); r2 = y; }
}}};
return 0; }

> THAT was one of my big confusions, the discussion about control
> dependencies and the fact that the logical ops broke the data
> dependency made me believe that the standard tried to actively avoid
> the whole issue with "control dependencies can break ordering
> dependencies on some CPU's due to branch prediction and memory
> re-ordering by the CPU".
>
> But after all the reading, I'm starting to think that that was never
> actually the implication at all, and the "logical ops breaks the data
> dependency rule" is simply an exception to the sequence point rule.
> All other sequence points still do exist, and do imply an ordering
> that matters for "consume"
>
> Am I now reading it right?
>
> So the clarification is basically to the statement that the "if
> (consume(p)) a" version *would* have an ordering guarantee between the
> read of "p" and "a", but the "consume(p) ? a : b" would *not* have
> such an ordering guarantee. Yes?

Not as I understand it. If my reply above wasn't clear, let me know and
I'll try to rephrase it into something that is.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/