Re: [RFC][PATCH 0/5] arch: atomic rework

From: Torvald Riegel
Date: Mon Feb 17 2014 - 17:10:34 EST


On Sat, 2014-02-15 at 11:15 -0800, Linus Torvalds wrote:
> On Sat, Feb 15, 2014 at 9:30 AM, Torvald Riegel <triegel@xxxxxxxxxx> wrote:
> >
> > I think the example is easy to misunderstand, because the context isn't
> > clear. Therefore, let me first try to clarify the background.
> >
> > (1) The abstract machine does not write speculatively.
> > (2) Emitting a branch instruction and executing a branch at runtime is
> > not part of the specified behavior of the abstract machine. Of course,
> > the abstract machine performs conditional execution, but that just
> > specifies the output / side effects that it must produce (e.g., volatile
> > stores) -- not with which hardware instructions it is producing this.
> > (3) A compiled program must produce the same output as if executed by
> > the abstract machine.
>
> Ok, I'm fine with that.
>
> > Thus, we need to be careful what "speculative store" is meant to refer
> > to. A few examples:
> >
> > if (atomic_load(&x, mo_relaxed) == 1)
> > atomic_store(&y, 3, mo_relaxed));
>
> No, please don't use this idiotic example. It is wrong.

It won't be useful in practice in a lot of cases, but that doesn't mean
it's wrong. It's clearly not illegal code. It also serves a purpose: a
simple example to reason about a few aspects of the memory model.

> The fact is, if a compiler generates anything but the obvious sequence
> (read/cmp/branch/store - where branch/store might obviously be done
> with some other machine conditional like a predicate), the compiler is
> wrong.

Why? I've reasoned why (1) to (3) above allow in certain cases (i.e.,
the first load always returning 1) for the branch (or other machine
conditional) to not be emitted. So please either poke holes into this
reasoning, or clarify that you don't in fact, contrary to what you wrote
above, agree with (1) to (3).

> Anybody who argues anything else is wrong, or confused, or confusing.

I appreciate your opinion, and maybe I'm just one of the three things
above (my vote is on "confusing"). But without you saying why doesn't
help me see what's the misunderstanding here.

> Instead, argue about *other* sequences where the compiler can do something.

I'd prefer if we could clarify the misunderstanding for the simple case
first that doesn't involve stronger ordering requirements in the form of
non-relaxed MOs.

> For example, this sequence:
>
> atomic_store(&x, a, mo_relaxed);
> b = atomic_load(&x, mo_relaxed);
>
> can validly be transformed to
>
> atomic_store(&x, a, mo_relaxed);
> b = (typeof(x)) a;
>
> and I think everybody agrees about that. In fact, that optimization
> can be done even for mo_strict.

Yes.

> But even that "obvious" optimization has subtle cases. What if the
> store is relaxed, but the load is strict? You can't do the
> optimization without a lot of though, because dropping the strict load
> would drop an ordering point. So even the "store followed by exact
> same load" case has subtle issues.

Yes if a compiler wants to optimize that, it has to give it more
thought. My gut feeling is that either the store should get the
stronger ordering, or the accesses should be merged. But I'd have to
think more about that one (which I can do on request).

> With similar caveats, it is perfectly valid to merge two consecutive
> loads, and to merge two consecutive stores.
>
> Now that means that the sequence
>
> atomic_store(&x, 1, mo_relaxed);
> if (atomic_load(&x, mo_relaxed) == 1)
> atomic_store(&y, 3, mo_relaxed);
>
> can first be optimized to
>
> atomic_store(&x, 1, mo_relaxed);
> if (1 == 1)
> atomic_store(&y, 3, mo_relaxed);
>
> and then you get the end result that you wanted in the first place
> (including the ability to re-order the two stores due to the relaxed
> ordering, assuming they can be proven to not alias - and please don't
> use the idiotic type-based aliasing rules).
>
> Bringing up your first example is pure and utter confusion.

Sorry if it was confusing. But then maybe we need to talk about it
more, because it shouldn't be confusing if we agree on what the memory
model allows and what not. I had originally picked the example because
it was related to the example Paul/Peter brought up.

> Don't do
> it. Instead, show what are obvious and valid transformations, and then
> you can bring up these kinds of combinations as "look, this is
> obviously also correct".

I have my doubts whether the best way to reason about the memory model
is by thinking about specific compiler transformations. YMMV,
obviously.

The -- kind of vague -- reason is that the allowed transformations will
be more complicated to reason about than the allowed output of a
concurrent program when understanding the memory model (ie, ordering and
interleaving of memory accesses, etc.). However, I can see that when
trying to optimize with a hardware memory model in mind, this might look
appealing.

What the compiler will do is exploiting knowledge about all possible
executions. For example, if it knows that x is always 1, it will do the
transform. The user would need to consider that case anyway, but he/she
can do without thinking about all potentially allowed compiler
optimizations.

> Now, the reason I want to make it clear that the code example you
> point to is a crap example is that because it doesn't talk about the
> *real* issue, it also misses a lot of really important details.
>
> For example, when you say "if the compiler can prove that the
> conditional is always true" then YOU ARE TOTALLY WRONG.

What are you trying to say? That a compiler can't prove that this is
the case in any program? Or that it won't be likely it can? Or what
else?

> So why do I say you are wrong, after I just gave you an example of how
> it happens? Because my example went back to the *real* issue, and
> there are actual real semantically meaningful details with doing
> things like load merging.
>
> To give an example, let's rewrite things a bit more to use an extra variable:
>
> atomic_store(&x, 1, mo_relaxed);
> a = atomic_load(&1, mo_relaxed);
> if (a == 1)
> atomic_store(&y, 3, mo_relaxed);
>
> which looks exactly the same.

I'm confused. Is this a new example?

Or is it supposed to be a compiler's rewrite of the following?

atomic_store(&x, 1, mo_relaxed);
if (atomic_load(&x, mo_relaxed) == 1)
atomic_store(&y, 3, mo_relaxed);

If that's supposed to be the case this is the same.

If it is supposed to be a rewrite of the original example with *just*
two stores, this is obviously an incorrect transformation (unless it can
be proven that no other thread writes to x). The compiler can't make
stuff conditional that wasn't conditional before. That would change the
allowed behavior of the abstract machine.

> If you now say "if you can prove the
> conditional is always true, you can make the store unconditional", YOU
> ARE WRONG.
>
> Why?
>
> This sequence:
>
> atomic_store(&x, 1, mo_relaxed);
> a = atomic_load(&x, mo_relaxed);
> atomic_store(&y, 3, mo_relaxed);
>
> is actually - and very seriously - buggy.
>
> Why? Because you have effectively split the atomic_load into two loads
> - one for the value of 'a', and one for your 'proof' that the store is
> unconditional.

I can't follow that, because it isn't clear to me which code sequences
are meant to belong together, and which transformations the compiler is
supposed to make. If you would clarify that, then I can reply to this
part.

> Maybe you go "Nobody sane would do that", and you'd be right. But
> compilers aren't "sane" (and compiler _writers_ I have some serious
> doubts about), and how you describe the problem really does affect the
> solution.

To me, that sounds like a strong argument for a strong and precise
specification, and formal methods. If we have that, stuff can be
resolved without having to rely on "sane" people, as long as they can
follow logic. Because this removes the problem from "hopefully all of
us want the same" (which is sometimes called "sane", from both sides of
a disagreement) to "just stick to the specification".

> My description ("you can combine two subsequent atomic accesses, if
> you are careful about still having the same ordering points") doesn't
> have the confusion.

I'd much rather take the memory model's specification. One example is
that it separates safety/correctness guarantees from liveness/progress
guarantees. "the same ordering points" can be understood as not being
able to merge subsequent loads to the same memory location. Both
progress guarantees and correctness guarantees can affect this, but they
have different consequences. For example, if it may be perfectly fine
to assume that two subsequent loads to the same memory location may
always execute atomically (and thus join them), you don't want to do
this to all such loads in a spin-wait loop (or you'll likely prevent
forward progress).

In the end, we need to define "are careful about" and similar clauses.
Eventually, after defining everything, we'll end up with a memory model
that is specified in as much detail as C11's is.

> It makes it clear that no, you can't speculate
> writes, but yes, obviously you can combine certain things (think
> "write buffers" and "memory caches"), and that may make you able to
> remove the conditional. Which is very different from speculating
> writes.

It's equally fine to reason about executions and allowed interleavings
on the level of the source program (ie, the memory model and abstract
machine). For example, combining certain things is essentially assuming
the atomic execution of those things; it will be clear from the program,
without thinking write buffers and caches, whether that's possible or
not in the source program. If it is, the compiler can do it too. The
compiler must not loose anything (eg, memory orders) when doing that; if
it wants to remove memory orders, it must show that this doesn't result
in executions not possible before. Also, it must not prevent executions
that ensure forward progress (e.g., a spin-wait loop should better not
be atomic wrt. everything else, or you'll block forever).

> But my description also makes it clear that the transformation above
> was buggy, but that rewriting it as
>
> a = 1;
> atomic_store(&y, 3, mo_relaxed);
> atomic_store(&x, a, mo_relaxed);
>
> is fine (doing the re-ordering here just to make a point).

Oh, so you were assuming that a is actually used afterwards?

> So I suspect we actually agree, I just think that the "example" that
> has been floating around, and the explanation for it, has been
> actively bad and misleading.

I do hope we agree. I'm still confused why you think the branch needs
to be emitted in the first example I brought up, but other than that, we
seem to be on the same page. I'm hopeful we can clarify the other
points if we keep talking about different examples.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/