Re: [RFC] tools/memory-model: Rule out OOTA

From: Alan Stern
Date: Thu Jan 09 2025 - 14:27:53 EST


On Thu, Jan 09, 2025 at 05:44:54PM +0100, Jonas Oberhauser wrote:
>
>
> Am 1/9/2025 um 5:17 PM schrieb Alan Stern:
> > What do you mean by "opaque code in the compiler barrier"? The
> > compiler_barrier() instruction doesn't generate any code at all; it
> > merely directs the compiler not to carry any knowledge about values
> > stored in memory from one side of the barrier to the other.
>
> What I mean by "opaque" is that the compiler does not analyze the code
> inside the compiler barrier, so it must treat it as a black box that could
> manipulate memory arbitrarily within the limitation that it can not guess
> the address of memory.

Okay, I see what you're getting at. The way you express it is a little
confusing, because in fact there is NO code inside the compiler barrier
(although the compiler doesn't know that) -- the barrier() macro expands
to an empty assembler instruction, along with an annotation telling the
compiler that this instruction may affect the contents of memory in
unknown and unpredictable ways.

> So for example, in
>
> int a = 1;
> barrier();
> a = 2;
> //...
>
> the compiler does not know how the code inside barrier() accesses memory,
> including volatile memory.

I would say rather that the compiler does not know that the values
stored in memory are the same before and after the barrier(). Even the
values of local variables whose addresses have not been exported.

> But it knows that it can not access `a`, because the address of `a` has
> never escaped before the barrier().

I don't think this is right. barrier is (or can be) a macro, not a
function call with its own scope. As such, it has -- in principle --
the ability to export the address of a.

Question: Can the compiler assume that no other threads access a between
the two stores, on the grounds that this would be a data race? I'd
guess that it can't make that assumption, but it would be nice to know
for sure.

> So it can change this to:
>
> barrier();
> int a = 2;
> // ...
>
> But if we let the address of `a` escape, for example with some external
> function foo(int*):
>
> int a;
> foo(&a);
> a = 1;
> barrier();
> a = 2;
> // ...
>
> Then the compiler has to assume that the code of foo and barrier might be
> something like this:
>
> foo(p) { SPECIAL_VARIABLE = p; }
> barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; }

I think you're giving the compiler too much credit. The one thing the
compiler is allowed to assume is that the code, as written, does not
contain a data race or other undefined behavior.

> and it must make sure that the value of `a` before barrier() is 1.
>
> That is at least my understanding.

This is the sort of question that memory-barriers.txt should answer.
It's closely related to the question I mentioned above.

> In fact, even if `a` is unused after a=2, the compiler can only eliminate
> `a` in the former case, but in the latter case, still needs to ensure that
> the value of `a` before barrier() is 1 (but it can eliminate a=2).

And what if a were a global shared variable instead of a local one? The
compiler is still allowed to do weird optimizations on it, since the
accesses aren't volatile. The barrier() merely prevents the compiler
from using its knowledge that a is supposed to contain 1 before the
barrier to influence its decisions about how to optimize the code
following the barrier.

> > That wasn't true in the C++ context of the paper Paul and I worked on.
> > Of course, C++ is not our current context here.
>
> Yes, you are completely correct. In C++ (or pure C), where data races are
> prevented by compiler/language-builtins rather than with
> compiler-barriers/volatiles, all my assumptions break.
>
> That is because the compiler absolutely knows that an atomic_store(&x) does
> not access any memory location other than x, so it can do a lot more
> "harmful" optimizations.
>
> That's why I said such a language model should just exclude global OOTA by
> fiat.

One problem with doing this is that there is no widely agreed-upon
formal definition of OOTA. A cycle in (rwdep ; rfe) isn't the answer
because rwdep does not encapsulate the notion of semantic dependency.

> I have to read your paper again (I think I read it a few months ago) to
> understand if the trivial OOTA would make even that vague axiom unsound
> (my intuition says that if the OOTA is never observed by influencing the
> side-effect, then forbidding OOTA makes no difference to the set of
> "observable behaviors" of a C++ program even there is a trivial OOTA, and if
> the OOTA has visible side-effects, then it is acceptable for the compiler
> not to do the "optimization" that turns it into a trivial OOTA and choose
> some other optimization instead, so we can as well forbid the compiler from
> doing it).

If an instance of OOTA is never observed, does it exist?

In the paper, I speculated that if a physical execution of a program
matches an abstract execution containing such a non-observed OOTA cycle,
then it also matches another abstract execution in which the cycle
doesn't exist. I don't know how to prove this conjecture, though.

Alan