Re: [RFC] tools/memory-model: Rule out OOTA

From: Jonas Oberhauser
Date: Tue Jan 21 2025 - 05:37:21 EST




Am 1/17/2025 um 9:01 PM schrieb Alan Stern:
On Fri, Jan 17, 2025 at 12:29:34PM +0100, Jonas Oberhauser wrote:


Am 1/17/2025 um 12:02 AM schrieb Alan Stern:
On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote:
I'm well aware that an absolute definition of semantic dependencies is not
easy to give.

In fact it's undecidable. No tool is ever going to be able to detect
semantic dependencies perfectly.

It depends.
Firstly, let's consider that the tool only runs on finite (or "finitized")
programs.

A program may be finite; that doesn't mean its executions are. Programs
can go into infinite loops. Are you saying that the tool would stop
verifying executions after (say) a billion steps? But then it would not
be able to detect semantic dependences in programs that run longer.

Yes, what I said is not fully correct. Finite (non-trivial-)loop-free programs, or programs with finite (representative) execution spaces, would have been more correct.




Seceondly, your definition depends on the compiler.

So in the sense that we don't know the compiler, it is undecidable.
But if you fix the compiler, you could still enumerate all executions under
that compiler and compute whether that compiler has a dependency or not.

You can't enumerate the executions unless you put an artificial bound on
the length of each execution, as noted above.

Note that for many practical cases it is possible to write test cases where the bound is not artificial in the sense that it is at least as large as the bound needed for exhaustive enumeration.


But as I mentioned before, I think you can define semantic dependencies more
appropriately because you don't really need to preserve semantic
dependencies in C++, and in LKMM (and VMM) you have volatiles that restrict
what kind of dependency-eliminations the compiler can do.

What makes you think this "more appropriate" definition of semantic
dependency will be any easier to detect than the original one?

For starters, as you mentioned, the compiler has to assume any value is possible.

Which means that if any other value would lead to a "different behavior", you know you have a semantic dependency. You can detect this in a per-thread manner, independently of the compiler.

Without the assumption of volatile, even a different value that could actually be generated in another run of the same program does not prove a semantic dependency.


Yes. Actually I wonder if you put this "all loads are volatile" restriction,
can a globally analysing compiler still have any optimizations that a
locally analysing compiler can not?

Yes, although whether they are pertinent is open to question. For
example, a globally optimizing compiler may observe that no thread ever
reads the value of a particular shared variable and then eliminate that
variable.

Oh, I meant the "all atomic objects is volatile" restriction, not just the loads. In that case, all stores the object - even if never read - still need to be generated.

Are there still any optimizations?

It rather seems the other way, that the locally analysing quasi-volatile
compiler can at least to some local optimizations, while the global volatile
compiler can not (e.g., x=1; y=x; can not be x=1; y=1; for the global
compiler because x is volatile now).

In the paper we speculate that it may be sufficient to require globally
optimizing compilers to treat atomics as quasi volatile with the added
restriction that loads must not be omitted.

I see.

But then how would you characterize semantic dependencies, if you want
to allow the definition to include some dependencies that aren't
semantic but not so many that you ever create a cycle?

I don't know which definition is correct yet, but the point is that you
don't have to avoid making so many dependencies that you would create a
cycle. It just forbids the compiler from looking for cycles and optimizing
based on the existance of the cycle. (Looking for unused vars and removing
those is still allowed, under the informal argument that this simulates an
execution where no OOTA happened)

At the moment, the only underlying ideas we have driving our notions of
semantic dependency is that a true semantic dependency should be one
which must give rise to order in the machine code. In turn, this order
prevents OOTA cycles from occuring during execution. That is the
essence of the paper.

Can your ideas be fleshed out to a comparable degree?

Well, I certainly have not put as much deep thought into it as you have, so it is certainly more fragile. But my current thoughts are along these lines:

We consider inter-thread semantic dependencies (isdep) based on the set of allowed executions + thread local optimizations + what would be allowed to happen if rfe edges along the dependencies become rfi edges due to merging. So the definition is not compiler-specific.

Those provide order at machine level unless the compiler restricts the set of executions through its choices, especially cross-thread optimizations, and then uses the restricted set of executions (i.e., possible input range) to optimize the execution further.
I haven't thought deeply about all the different optimizations that are possible there.

For an example of how it may not provide order, if we have accesses related by isdep;rfe;isdep, then the compiler could merge all the involved threads into one, and the accesses could no longer have any dependency.

So it is important that one can not forbid isdep;rf cycles, the axiom would be that isdep;rf is irreflexive.


When merging threads, if in an OOTA execution there is an inter-thread semantic dependency from r in one of the merged threads to w in one of the merged threads, such that r reads from a non-merged thread and w is read by a non-merged thread in the OOTA cycle, then it is still an inter-thread dependency which still preserves the order. This prevents the full cycle even if it some other sub-edges within the merged threads are now no longer dependencies.

But if r is reading from w in the OOTA cycle, then the compiler (which is not allowed to look for the cycle) has to put r before w in the merged po, preventing r from reading w. (it can also omit r by giving it a value that it could legally read in this po, but it still can't get its value from w this way).

E.g.,
P0 {
y = x;
}
P1 {
z = y;
}
... (some way to assign x dependent on z)

would have an inter-thread dependency from the load of x to the store to z. If P0 and P1 are merged, and there are no other accesses to y, then the intra-thread dependency from loading x to store to y etc. are eliminated, but the inter-thread dependency from x to z remains.



So for example, if we merge x = y || y = x, the merge can turn it into
x=y=x or y=x=y (and then into an empty program), but not into a cyclic
dependency. So even though one side of the dependency may be violated, for
sake of OOTA, we could still label both sides as dependent.

They _are_ both semantically dependent (in the original parallel
version, I mean). I don't see what merging has to do with it.

Note that I was considering the case where they are not volatile.
With a compiler that is not treating them as volatile, which merges the two
threads, under your definition, there is no semantic dependency in at least
one direction because there is no hardware realization H where you read
something else (of course you exclude such compilers, but I think
realistically they should be allowed).

My point is that we can say they are semantically dependent for the sake of
OOTA, not derive any ordering from these dependencies other than the
cyclical one, and therefore allow compilers to one of the two optimizations
(make x=y no longer depend on y or make y=x no longer depend on x) but not
make a cycle analysis to remove both dependencies and generate an OOTA value
(it can remove both dependencies by leaving x and y unchanged though).

I don't understand.

Does the explanation above help?


I do not like the idea of tying the definition of OOTA (which needs to
apply to every implementation) to a particular clang compiler.

But that is what you have done, no? Whether something is an sdep depends on
the compiler, so compiler A could generate an execution that is OOTA in the
sdep definition of compiler B.

Yes, but this does not tie the definition to _one particular_ compiler.
That is, we don't say "This dependency is semantic because of the way
GCC 14.2.1 handles it." Rather, we define for each compiler whether a
dependency is semantic for _that_ compiler.

Yes, but the result is still that every compiler has its own memory model (at least at the point of which graphs are being ruled out as OOTA). So there is no definition of 'G is OOTA', only 'G is OOTA on compiler x'.

The example I gave of the tool verifying the program relative to one specific compiler is also not giving a definition of 'G is OOTA', in fact, it does not specify OOTA at all; it simply says ``with compiler x, we get the following "semantic dependencies" and the following graphs...''

And as long as compiler x does not generate OOTA, there will be no OOTA graphs among those.

So it does not solve or tackle the theoretical problem in any way, and can not do cross-compiler verification. But it will already apply the 'naive-dependency-breaking optimizations' that you would not see in e.g. herd7.


So you either need a definition of semantic dependency that
a) applies in all cases we practically need and
b) is guaranteed by all compilers

or we need to live with the fact that we do not have a semantic dependency
definition that is independent of the compilation (even of the same
compiler) and need to do our verification for that specific compilation.

Add the qualification that the definition should be practical to
evaluate, and I agree.

Yes, an important point. And hard to resolve as well.


Best wishes,
jonas