On Wed, Jan 08, 2025 at 08:22:07PM +0100, Jonas Oberhauser wrote:
Am 1/8/2025 um 7:47 PM schrieb Alan Stern:
On Wed, Jan 08, 2025 at 06:33:05PM +0100, Jonas Oberhauser wrote:
Am 1/7/2025 um 5:09 PM schrieb Alan Stern:
Is this really valid? In the example above, if there were no other
references to a or b in the rest of the program, the compiler could
eliminate them entirely.
In the example above, this is not possible, because the address of a/b have
escaped the function and are not deallocated before synchronization happens.
Therefore the compiler must assume that a/b are accessed inside the compiler
barrier.
I'm not quite sure what you mean by that, but if the compiler has access
to the entire program and can do a global analysis then it can recognize
cases where accesses that may seem to be live aren't really.
Even for trivial enough cases where the compiler has access to all the
source, compiler barriers should be opaque to the compiler.
Since it is opaque,
*a = 1;
compiler_barrier();
might as well be
*a = 1;
*d = *a; // *d is in device memory
and so in my opinion the compiler needs to ensure that the value of *a right
before the compiler barrier is 1.
Of course, only if the address of *a could be possibly legally known to the
opaque code in the compiler barrier.
What do you mean by "opaque code in the compiler barrier"? The
compiler_barrier() instruction doesn't generate any code at all; it
merely directs the compiler not to carry any knowledge about values
stored in memory from one side of the barrier to the other.
Note that it does _not_ necessarily prevent the compiler from carrying
knowledge that a memory location is unused from one side of the barrier
to the other.
However, I admit this objection doesn't really apply to Linux kernel
programming.
(Whether the result could count as OOTA is
open to question, but that's not the point.) Is it not possible that a
compiler might find other ways to defeat your intentions?
The main point (which I already mentioned in the previous e-mail) is if the
object is deallocated without synchronization (or never escapes the function
in the first place).
And indeed, any such case renders the added rule unsound. It is in a sense
unrelated to OOTA; cases where the load/store can be elided are never OOTA.
That is a matter of definition. In our paper, Paul and I described
instances of OOTA in which all the accesses have been optimized away as
"trivial".
Yes, by OOTA I mean a rwdep;rfe cycle.
In the absence of data races, such a cycle can't be optimized away because
it is created with volatile/compiler-barrier-protected accesses.
That wasn't true in the C++ context of the paper Paul and I worked on.
Of course, C++ is not our current context here.
What I was trying to get at above is that compiler-barrier protection
does not necessarily guarantee that non-volatile accesses can't be
optimized away. (However, it's probably safe for us to make such an
assumption here.)
It seems the ideal solution is to let Live be defined by the tools, which
should keep up with or exceed the analysis done by state-of-art compilers.
I don't think it works that way in practice. :-)