Re: [RFC] tools/memory-model: Rule out OOTA

From: Jonas Oberhauser
Date: Thu Jan 09 2025 - 11:45:36 EST




Am 1/9/2025 um 5:17 PM schrieb Alan Stern:
On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote:


Am 1/8/2025 um 7:47 PM schrieb Alan Stern:
On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote:


Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
Is this really valid? In the example above, if there were no other
references to a or b in the rest of the program, the compiler could
eliminate them entirely.

In the example above, this is not possible, because the address of a/b have
escaped the function and are not deallocated before synchronization happens.
Therefore the compiler must assume that a/b are accessed inside the compiler
barrier.

I'm not quite sure what you mean by that, but if the compiler has access
to the entire program and can do a global analysis then it can recognize
cases where accesses that may seem to be live aren't really.

Even for trivial enough cases where the compiler has access to all the
source, compiler barriers should be opaque to the compiler.

Since it is opaque,

*a = 1;
compiler_barrier();

might as well be

*a = 1;
*d = *a; // *d is in device memory

and so in my opinion the compiler needs to ensure that the value of *a right
before the compiler barrier is 1.

Of course, only if the address of *a could be possibly legally known to the
opaque code in the compiler barrier.

What do you mean by "opaque code in the compiler barrier"? The
compiler_barrier() instruction doesn't generate any code at all; it
merely directs the compiler not to carry any knowledge about values
stored in memory from one side of the barrier to the other.

What I mean by "opaque" is that the compiler does not analyze the code inside the compiler barrier, so it must treat it as a black box that could manipulate memory arbitrarily within the limitation that it can not guess the address of memory.

So for example, in

int a = 1;
barrier();
a = 2;
//...

the compiler does not know how the code inside barrier() accesses memory, including volatile memory.
But it knows that it can not access `a`, because the address of `a` has never escaped before the barrier().

So it can change this to:

barrier();
int a = 2;
// ...

But if we let the address of `a` escape, for example with some external function foo(int*):

int a;
foo(&a);
a = 1;
barrier();
a = 2;
// ...

Then the compiler has to assume that the code of foo and barrier might be something like this:

foo(p) { SPECIAL_VARIABLE = p; }
barrier() { TURN_THE_BREAKS_ON = *SPECIAL_VARIABLE; }

and it must make sure that the value of `a` before barrier() is 1.

That is at least my understanding.

In fact, even if `a` is unused after a=2, the compiler can only eliminate `a` in the former case, but in the latter case, still needs to ensure that the value of `a` before barrier() is 1 (but it can eliminate a=2).



Note that it does _not_ necessarily prevent the compiler from carrying
knowledge that a memory location is unused from one side of the barrier
to the other.

Yes, or even merging/moving assignments to the memory location across a barrier(), as in the example above.


However, I admit this objection doesn't really apply to Linux kernel
programming.

(Whether the result could count as OOTA is
open to question, but that's not the point.) Is it not possible that a
compiler might find other ways to defeat your intentions?

The main point (which I already mentioned in the previous e-mail) is if the
object is deallocated without synchronization (or never escapes the function
in the first place).

And indeed, any such case renders the added rule unsound. It is in a sense
unrelated to OOTA; cases where the load/store can be elided are never OOTA.

That is a matter of definition. In our paper, Paul and I described
instances of OOTA in which all the accesses have been optimized away as
"trivial".

Yes, by OOTA I mean a rwdep;rfe cycle.

In the absence of data races, such a cycle can't be optimized away because
it is created with volatile/compiler-barrier-protected accesses.

That wasn't true in the C++ context of the paper Paul and I worked on.
Of course, C++ is not our current context here.

Yes, you are completely correct. In C++ (or pure C), where data races are prevented by compiler/language-builtins rather than with compiler-barriers/volatiles, all my assumptions break.

That is because the compiler absolutely knows that an atomic_store(&x) does not access any memory location other than x, so it can do a lot more "harmful" optimizations.

That's why I said such a language model should just exclude global OOTA by fiat.

I have to read your paper again (I think I read it a few months ago) to understand if the trivial OOTA would make even that vague axiom unsound
(my intuition says that if the OOTA is never observed by influencing the side-effect, then forbidding OOTA makes no difference to the set of "observable behaviors" of a C++ program even there is a trivial OOTA, and if the OOTA has visible side-effects, then it is acceptable for the compiler not to do the "optimization" that turns it into a trivial OOTA and choose some other optimization instead, so we can as well forbid the compiler from doing it).

What I was trying to get at above is that compiler-barrier protection
does not necessarily guarantee that non-volatile accesses can't be
optimized away. (However, it's probably safe for us to make such an
assumption here.)

Yes, I agree.

It seems the ideal solution is to let Live be defined by the tools, which
should keep up with or exceed the analysis done by state-of-art compilers.

I don't think it works that way in practice. :-)

Yeah... maybe not :(

Best wishes,
jonas