Re: [RFC] tools/memory-model: Rule out OOTA
From: Alan Stern
Date: Thu Jan 09 2025 - 22:13:13 EST
On Thu, Jan 09, 2025 at 09:09:00PM +0100, Jonas Oberhauser wrote:
> > > So for example, in
> > >
> > > int a = 1;
> > > barrier();
> > > a = 2;
> > > //...
> > >
> > > the compiler does not know how the code inside barrier() accesses memory,
> > > including volatile memory.
> >
> > I would say rather that the compiler does not know that the values
> > stored in memory are the same before and after the barrier().
> >
> > Even the
> > values of local variables whose addresses have not been exported.
>
> No, this is not true. I used to think so too until a short while ago.
>
> But if you look at the output of gcc -o3 you will see that it does happily
> remove `a` in this function.
Isn't that consistent with what I said?
> > > But it knows that it can not access `a`, because the address of `a` has
> > > never escaped before the barrier().
> >
> > I don't think this is right. barrier is (or can be) a macro, not a
> > function call with its own scope. As such, it has -- in principle --
> > the ability to export the address of a.
>
> See above. Please test it for yourself, but for your convenience here is the
> code
Oh, I believe that gcc does what you say. I'm just not sure your
explanation is entirely accurate.
> If you wanted to avoid this, you would need to expose the address of `a` to
> the asm block using one of its clobber arguments (but I'm not familiar with
> the syntax to produce a working example on the spot).
>
> >
> > Question: Can the compiler assume that no other threads access a between
> > the two stores, on the grounds that this would be a data race? I'd
> > guess that it can't make that assumption, but it would be nice to know
> > for sure.
>
> It can not make the assumption if &a has escaped. In that case, barrier()
> could be so:
>
> barrier(){
> store_release(OTHER_THREAD_PLEASE_MODIFY,&a);
>
> while (! load_acquire(OTHER_THREAD_IS_DONE));
> }
>
> with another thread doing
>
> while (! load_acquire(OTHER_THREAD_PLEASE_MODIFY)) yield();
> *OTHER_THREAD_PLEASE_MODIFY ++;
> store_release(OTHER_THREAD_IS_DONE, 1);
Bear in mind that there's a difference between what a compiler _can do_
and what gcc _currently does_.
> > I think you're giving the compiler too much credit. The one thing the
> > compiler is allowed to assume is that the code, as written, does not
> > contain a data race or other undefined behavior.
>
> Apologies, the way I used "assume" is misleading.
> I should have said that the compiler has to ensure that even if the code of
> foo() and barrier() were so, that the behavior of the code it generates is
> the same (w.r.t. observable side effects) as if the program were executed by
> the abstract machine. Or I should have said that it can *not* assume that
> the functions are *not* as defined above.
>
> Which means that TURN_THE_BREAKS_ON would need to be assigned 1.
> The only way the compiler can achieve that guarantee (while treating barrier
> as a black box) is to make sure that the value of `a` before barrier() is 1.
Who says the compiler has to treat barrier() as a black box? As far as
I know, gcc makes no such guarantee.
> > > That's why I said such a language model should just exclude global OOTA by
> > > fiat.
> >
> > One problem with doing this is that there is no widely agreed-upon
> > formal definition of OOTA. A cycle in (rwdep ; rfe) isn't the answer
> > because rwdep does not encapsulate the notion of semantic dependency.
>
> rwdep does not encapsulate any specific notion.
> It is herd7 which decides which dependency edges to add to the graphs it
> generates.
Sorry, that's what I meant: rwdep plus the decisions that herd7 makes
about which edges are dependencies.
> If herd7 would generate edges for semantic dependencies instead of for its
> version of syntactic dependencies, then rwdep is the answer.
That statement is meaningless (or at least, impossible to implement)
because there is no widely agreed-upon formal definition for semantic
dependency.
> Given that we can not define dep inside the cat model, one may as well
> define it as rwdep;rfe with the intended meaning of the dependencies being
> the semantic ones; then it is an inaccuracy of herd7 that it does not
> provide the proper dependencies.
Perhaps so. LKMM does include other features which the compiler can
defeat if the programmer isn't sufficiently careful. Still, I suspect
that changing the memory model solely with the goal of eliminating OOTA
may not be a good idea.
> Anyways LKMM should not care about syntactic dependencies, e.g.
>
>
> if (READ_ONCE(*a)) {
> WRITE_ONCE(*b,1);
> } else {
> WRITE_ONCE(*b,1);
> }
>
> has no semantic dependency and gcc does not guarantee the order between
> these two accesses, even though herd7 does give us a dependency edge.
Like I said.
> > > I have to read your paper again (I think I read it a few months ago) to
> > > understand if the trivial OOTA would make even that vague axiom unsound
> > > (my intuition says that if the OOTA is never observed by influencing the
> > > side-effect, then forbidding OOTA makes no difference to the set of
> > > "observable behaviors" of a C++ program even there is a trivial OOTA, and if
> > > the OOTA has visible side-effects, then it is acceptable for the compiler
> > > not to do the "optimization" that turns it into a trivial OOTA and choose
> > > some other optimization instead, so we can as well forbid the compiler from
> > > doing it).
> >
> > If an instance of OOTA is never observed, does it exist?
>
> :) :) :)
>
> > In the paper, I speculated that if a physical execution of a program
> > matches an abstract execution containing such a non-observed OOTA cycle,
> > then it also matches another abstract execution in which the cycle
> > doesn't exist. I don't know how to prove this conjecture, though.
>
> Yes, that also makes sense.
>
> Note that this speculation does not hold in the current LKMM though. In the
> Litmus test I shared in the opening e-mail, where the outcome 0:r1=1 /\
> 1:r1=1 is only possible with an OOTA (even though the values from the OOTA
> are never used anywhere).
If the fact that the outcome 0:r1=1 /\ 1:r1=1 has occurred is proof that
there was OOTA, then the OOTA cycle _is_ observed, albeit indirectly --
at least, in the sense that I intended. (The situation mentioned in the
paper is better described as an execution where the compiler has elided
all the accesses in the OOTA cycle.)
> With C++'s non-local model I wouldn't be totally surprised if there were
> similar examples in C++, but given that its ordering definition is a lot
> more straightforward than LKMM in that it doesn't have all these cases of
> different barriers like wmb and rmb and corner cases like Noreturn etc., my
> intuition says that there aren't any.
I'll have to give this some thought.
Alan
> But I am not going to think deeply about it for the time being.
>
> Best wishes
> jonas