On Thu, Jan 09, 2025 at 05:44:54PM +0100, Jonas Oberhauser wrote:
Am 1/9/2025 um 5:17 PM schrieb Alan Stern:
What do you mean by "opaque code in the compiler barrier"? The
compiler_barrier() instruction doesn't generate any code at all; it
merely directs the compiler not to carry any knowledge about values
stored in memory from one side of the barrier to the other.
What I mean by "opaque" is that the compiler does not analyze the code
inside the compiler barrier, so it must treat it as a black box that could
manipulate memory arbitrarily within the limitation that it can not guess
the address of memory.
Okay, I see what you're getting at. The way you express it is a little
confusing, because in fact there is NO code inside the compiler barrier
(although the compiler doesn't know that) -- the barrier() macro expands
to an empty assembler instruction, along with an annotation telling the
compiler that this instruction may affect the contents of memory in
unknown and unpredictable ways.
>
So for example, in
int a = 1;
barrier();
a = 2;
//...
the compiler does not know how the code inside barrier() accesses memory,
including volatile memory.
I would say rather that the compiler does not know that the values
stored in memory are the same before and after the barrier().
Even the
values of local variables whose addresses have not been exported.
> I don't think this is right. barrier is (or can be) a macro, not a
But it knows that it can not access `a`, because the address of `a` has
never escaped before the barrier().
Question: Can the compiler assume that no other threads access a between
the two stores, on the grounds that this would be a data race? I'd
guess that it can't make that assumption, but it would be nice to know
for sure.
But if we let the address of `a` escape, for example with some external
function foo(int*):
int a;
foo(&a);
a = 1;
barrier();
a = 2;
// ...
Then the compiler has to assume that the code of foo and barrier might be
something like this:
foo(p) { SPECIAL_VARIABLE = p; }
barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; }
I think you're giving the compiler too much credit. The one thing the
compiler is allowed to assume is that the code, as written, does not
contain a data race or other undefined behavior.
That is at least my understanding.
This is the sort of question that memory-barriers.txt should answer.
It's closely related to the question I mentioned above.
In fact, even if `a` is unused after a=2, the compiler can only eliminate
`a` in the former case, but in the latter case, still needs to ensure that
the value of `a` before barrier() is 1 (but it can eliminate a=2).
And what if a were a global shared variable instead of a local one? The
compiler is still allowed to do weird optimizations on it, since the
accesses aren't volatile.
That wasn't true in the C++ context of the paper Paul and I worked on.
Of course, C++ is not our current context here.
Yes, you are completely correct. In C++ (or pure C), where data races are
prevented by compiler/language-builtins rather than with
compiler-barriers/volatiles, all my assumptions break.
That is because the compiler absolutely knows that an atomic_store(&x) does
not access any memory location other than x, so it can do a lot more
"harmful" optimizations.
That's why I said such a language model should just exclude global OOTA by
fiat.
One problem with doing this is that there is no widely agreed-upon
formal definition of OOTA. A cycle in (rwdep ; rfe) isn't the answer
because rwdep does not encapsulate the notion of semantic dependency.
I have to read your paper again (I think I read it a few months ago) to
understand if the trivial OOTA would make even that vague axiom unsound
(my intuition says that if the OOTA is never observed by influencing the
side-effect, then forbidding OOTA makes no difference to the set of
"observable behaviors" of a C++ program even there is a trivial OOTA, and if
the OOTA has visible side-effects, then it is acceptable for the compiler
not to do the "optimization" that turns it into a trivial OOTA and choose
some other optimization instead, so we can as well forbid the compiler from
doing it).
If an instance of OOTA is never observed, does it exist?
In the paper, I speculated that if a physical execution of a program
matches an abstract execution containing such a non-observed OOTA cycle,
then it also matches another abstract execution in which the cycle
doesn't exist. I don't know how to prove this conjecture, though.