Re: [RFC] tools/memory-model: Rule out OOTA

From: Jonas Oberhauser
Date: Fri Jan 17 2025 - 06:30:05 EST

Next message: Kevin Brodsky: "[PATCH] mm/slab: simplify SLAB_* flag handling"
Previous message: Ulf Hansson: "Re: [PATCH 0/4] soc: qcom: ice: fix dev reference leaked through of_qcom_ice_get"
In reply to: Hernan Ponce de Leon: "Re: [RFC] tools/memory-model: Rule out OOTA"
Next in thread: Alan Stern: "Re: [RFC] tools/memory-model: Rule out OOTA"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Am 1/17/2025 um 12:02 AM schrieb Alan Stern:

On Thu, Jan 16, 2025 at 08:08:22PM +0100, Jonas Oberhauser wrote:

I'm well aware that an absolute definition of semantic dependencies is not
easy to give.

In fact it's undecidable. No tool is ever going to be able to detect
semantic dependencies perfectly.

It depends.
Firstly, let's consider that the tool only runs on finite (or "finitized") programs.
Seceondly, your definition depends on the compiler.

So in the sense that we don't know the compiler, it is undecidable.
But if you fix the compiler, you could still enumerate all executions under that compiler and compute whether that compiler has a dependency or not.

But as I mentioned before, I think you can define semantic dependencies more appropriately because you don't really need to preserve semantic dependencies in C++, and in LKMM (and VMM) you have volatiles that restrict what kind of dependency-eliminations the compiler can do.

int * x[] = { &a, &a };
int i = b;
*x[i] = 1;

here the semantic address dependency from loading b to loading x[i] and the
semantic address dependency from loading x[i] and storing to *x[i] do not
together form a semantic dependency anymore, because *x[i] is always a. So
this whole code can just become a=1, and with mirrored code you can get
a=b=1, which is an sdep | rfe cycle.

We regard sdep as extending from loads (or sets of loads) to stores.
(Perhaps the paper doesn't state this explicitly -- it should.) So an
sdep ; sdep sequence is not possible.

I see. There is a line in the introduction ("Roughly speaking, there is a semantic dependency from a given load
to a given store when all other things being equal, a change in the value loaded can
change the value stored or prevent the store from occurring at all"), but I read it too carelessly and treated it as an implication rather than an iff.

I suppose at some point you anyways replace the definition of OOTA with a version where only semantic dependencies from loads to a store are considered, so the concern becomes irrelevant then.

- On the one hand, that definition makes a lot of sense. On the other hand,
at least without the atomics=volatile restriction it would have the downside
that a compiler which generates just a single execution for your program can
say that there are no dependencies whatsoever and generate all kinds of "out
of thin air" behaviors.

That is so. But a compiler which examines only a single thread at a
time cannot afford to generate just a single execution, because it
cannot know what values the loads will obtain when the full program
runs.

I am not sure if that gets really resolved by the volatile restrictions you
put, but either way those seem far stronger than what one would want.

It does get resolved, because treating atomics as volatile also means
that the compiler cannot afford to generate just a single execution.
Again, because it cannot know what values the loads will obtain at
runtime, since volatile loads can yield any value in a non-benign
environment.

Yes. Actually I wonder if you put this "all loads are volatile" restriction, can a globally analysing compiler still have any optimizations that a locally analysing compiler can not?

It rather seems the other way, that the locally analysing quasi-volatile compiler can at least to some local optimizations, while the global volatile compiler can not (e.g., x=1; y=x; can not be x=1; y=1; for the global compiler because x is volatile now).

I would say that the approach with volatile is overzealous because it tries
to create a "local" order solution to the problem that only requires a
"global" ordering solution. Since not every semantic dependency needs to
provide order in C++ -- only the cycle of dependencies -- it is totally ok
to add too many semantic dependency edges to a program, even those that are
not going to be exactly maintained by every compiler, as long as we can
ensure that globally, no dependency cycle occurs.

But then how would you characterize semantic dependencies, if you want
to allow the definition to include some dependencies that aren't
semantic but not so many that you ever create a cycle?

I don't know which definition is correct yet, but the point is that you don't have to avoid making so many dependencies that you would create a cycle. It just forbids the compiler from looking for cycles and optimizing based on the existance of the cycle. (Looking for unused vars and removing those is still allowed, under the informal argument that this simulates an execution where no OOTA happened)

This sounds like
an even worse problem than we started with!

>
>

So for example, if we merge x = y || y = x, the merge can turn it into
x=y=x or y=x=y (and then into an empty program), but not into a cyclic
dependency. So even though one side of the dependency may be violated, for
sake of OOTA, we could still label both sides as dependent.

They _are_ both semantically dependent (in the original parallel
version, I mean). I don't see what merging has to do with it.

Note that I was considering the case where they are not volatile.
With a compiler that is not treating them as volatile, which merges the two threads, under your definition, there is no semantic dependency in at least one direction because there is no hardware realization H where you read something else (of course you exclude such compilers, but I think realistically they should be allowed).

My point is that we can say they are semantically dependent for the sake of OOTA, not derive any ordering from these dependencies other than the cyclical one, and therefore allow compilers to one of the two optimizations (make x=y no longer depend on y or make y=x no longer depend on x) but noth make a cycle analysis to remove both dependencies and generate an OOTA value (it can remove both dependencies by leaving x and y unchanged though).

So I'm not convinced that for either of the two cases you need to do a
compiler-specific definition of dependencies.

For the C++ case, I cannot think of any other way to approach the
problem. Nor do I see anything wrong with a compiler-specific
definition, given how nebulous the whole idea is in the first place.

BTW, for what it's worth, Dat3M in a sense uses the clang dependencies - it
first allows the compiler to do its optimizations, and then verifies the
llvm-ir (with a more hardware-like dependency definition).

What do you mean by "verifies"? What property of the llvm-ir does it
verify?

Verify that the algorithm, e.g., qspinlock, has no executions that are buggy, e.g., have a data race on the critical section.

It does so for a fixed test case with a small number of threads, i.e., a finite program.
>> I think something like that can be a good practical solution with fewer

problems than other attempts to approximate the solution.

I do not like the idea of tying the definition of OOTA (which needs to
apply to every implementation) to a particular clang compiler.

But that is what you have done, no? Whether something is an sdep depends on the compiler, so compiler A could generate an execution that is OOTA in the sdep definition of compiler B.
(Of course with the assumption of atomic=volatile, it may just be that we are back to the beginning and all "naive" semantic dependencies are actually semantic dependencies for all compilers).

Anyways what I meant is not about tying the definition of OOTA to one compiler or other. As I mentioned I think it can be fine to define OOTA in the same way for all compilers.
What I meant is to specialize the memory model to a specific compiler, as long as that is the compiler that is used in reality.
So long as your code does not depend on the ordering of any semantic dependencies, the verification can be cross platform.

And although...

> I'm not particularly concerned about OOTA or semantic dependencies in
LKMM.

... there is code that relies on semantic dependencies, e.g. RCU read side CS code. (even if we do not care about OOTA).
For that code, the semantic dependencies must be guaranteed to create ordering.

So you either need a definition of semantic dependency that
a) applies in all cases we practically need and
b) is guaranteed by all compilers

or we need to live with the fact that we do not have a semantic dependency definition that is independent of the compilation (even of the same compiler) and need to do our verification for that specific compilation.

I think for LKMM we could give such a semantic dependency definition because it uses volatile, and verify RCU-read-side code. But we currently do not have one. What I meant to say is that using the actual (whatever compiler you use) optimizations first to remove syntactic dependencies, and then verifying under the assumption of whatever dependencies are left, may be better than trying to approximate dependencies in some way in cat. Given that we want to verify and rely on the code today, not in X years when we all agree on what a compiler-independent definition of semantic dependency is.

I think for C++ consume we could also give one by simply restricting some compiler optimizations for consume loads (and doing whatever needs to be done on alpha). Or just kick it out and not have any dependency ordering except the global OOTA case.

Sorry for the confusion, I think there are so many different combinations/battlefields (OOTA vs just semantic dependencies, volatile/non-volatile atomics, verifying the model vs verifying a piece of code etc.) that it becomes hard for me not to confuse myself, let alone others :))

Best wishes,
jonas

Next message: Kevin Brodsky: "[PATCH] mm/slab: simplify SLAB_* flag handling"
Previous message: Ulf Hansson: "Re: [PATCH 0/4] soc: qcom: ice: fix dev reference leaked through of_qcom_ice_get"
In reply to: Hernan Ponce de Leon: "Re: [RFC] tools/memory-model: Rule out OOTA"
Next in thread: Alan Stern: "Re: [RFC] tools/memory-model: Rule out OOTA"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]