Re: [RFC] tools/memory-model: Rule out OOTA
From: Jonas Oberhauser
Date: Wed Jan 08 2025 - 14:18:28 EST
Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney:
On Wed, Jan 08, 2025 at 06:39:12PM +0100, Jonas Oberhauser wrote:
Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney:
On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote:
(For example, in a presentation to the C++ working group last year, Paul
and I didn't try to show how to extend the C++ memory model to exclude
OOTA [other than by fiat, as it does now]. Instead we argued that with
the existing memory model, no reasonable compiler would ever produce an
executable that could exhibit OOTA and so the memory model didn't need
to be changed.)
Furthermore, the LKMM design choice was that if a given litmus test was
flagged as having a data race, anything might happen, including OOTA.
Note that there is no data race in this litmus test.
There is a race condition on plain accesses according to LKMM,
but LKMM also says that this is *not* a data race.
The patch removes the (actually non-existant) race condition by saying that
a critical section that is protected from having a data race with address
dependency or rmb/wmb (which LKMM already says works for avoiding data
races), is in fact also ordered and therefore has no race condition either.
As a side effect :), this happens to fix OOTA in general in LKMM.
Fair point, no data race is flagged.
On the other hand, Documentation/memory-barriers.txt says the following:
------------------------------------------------------------------------
However, stores are not speculated. This means that ordering -is- provided
for load-store control dependencies, as in the following example:
q = READ_ONCE(a);
if (q) {
WRITE_ONCE(b, 1);
}
Control dependencies pair normally with other types of barriers.
That said, please note that neither READ_ONCE() nor WRITE_ONCE()
are optional! Without the READ_ONCE(), the compiler might combine the
load from 'a' with other loads from 'a'. Without the WRITE_ONCE(),
the compiler might combine the store to 'b' with other stores to 'b'.
Either can result in highly counterintuitive effects on ordering.
------------------------------------------------------------------------
If I change the two plain assignments to use WRITE_ONCE() as required
by memory-barriers.txt, OOTA is avoided:
I think this direction of inquiry is a bit misleading. There need not be
any speculative store at all:
P0(int *a, int *b, int *x, int *y) {
int r1;
int r2 = 0;
r1 = READ_ONCE(*x);
smp_rmb();
if (r1 == 1) {
r2 = *b;
}
WRITE_ONCE(*a, r2);
smp_wmb();
WRITE_ONCE(*y, 1);
}
P1(int *a, int *b, int *x, int *y) {
int r1;
int r2 = 0;
r1 = READ_ONCE(*y);
smp_rmb();
if (r1 == 1) {
r2 = *a;
}
WRITE_ONCE(*b, r2);
smp_wmb();
WRITE_ONCE(*x, 1);
}
The reason that the WRITE_ONCE helps in the speculative store case is
that both its ctrl dependency and the wmb provide ordering, which
together creates ordering between *x and *y.
I should point out that a version of herd7 that respects semantic
dependencies (instead of syntactic only) might solve this case, by
figuring out that the WRITE_ONCE to *a resp. *b depends on the first
READ_ONCE.
Here's another funny example:
P0(int *a, int *b, int *x, int *y) {
int r1;
r1 = READ_ONCE(*x);
smp_rmb();
int r2 = READ_ONCE(*b);
if (r1 == 1) {
r2 = *b;
}
WRITE_ONCE(*a, r2);
smp_wmb();
WRITE_ONCE(*y, 1);
}
P1(int *a, int *b, int *x, int *y) {
int r1;
r1 = READ_ONCE(*y);
smp_rmb();
int r2 = READ_ONCE(*a);
if (r1 == 1) {
r2 = *a;
}
WRITE_ONCE(*b, r2);
smp_wmb();
WRITE_ONCE(*x, 1);
}
exists (0:r1=1 /\ 1:r1=1)
Is there still a semantic dependency from the inner load to the store to
*a resp. *b, especially since the outer load from *b resp. *a is reading
from the same store as the inner one? The compiler is definitely allowed
to eliminate the inner load, which *also removes the OOTA*.
Please do look at the OOTA graph generated by herd7 for this one, it
looks quite amazing.
If LKMM is to allow plain assignments in this case, we need to also update
memory-barriers.txt.
But I am not suggesting to allow the plain assignment *by itself*.
In particular, my patch does not enforce any happens-before order
between the READ_ONCE(*x) and the plain assignment to *b.
It only provides order between READ_ONCE(*x) and WRITE_ONCE(*y,...),
through dependencies in the plain critical section.
Which must be 1) properly guarded (e.g., by rmb/wmb) and 2) live.
Because of this, I don't know if the text needs much updating, although
one could add a text in the direction that "in the rare case where
compilers do guarantee that a load and dependent store (including plain)
will be emitted in some form, one can use rmb and wmb to ensure the
order of surrounding marked accesses".
> I am reluctant to do this because the community> needs to trust plain
C-language assignments less rather than more,
especially given that compilers are continuing to become more aggressive.
Yes, I agree.
Yes, in your example, the "if" and the two explicit barriers should
prevent compilers from being too clever, but these sorts of things are
more fragile than one might think given future code changes.
Thoughts?
We certainly need to be very careful about how to formalize what the
compiler is allowed of doing and what it is not. And even more careful
about how to communicate this.
Best wishes,
jonas