Re: [RFC] tools/memory-model: Rule out OOTA
From: Jonas Oberhauser
Date: Fri Jan 10 2025 - 11:22:56 EST
Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney:
On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote:
Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney:
On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote:
Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney:
If I change the two plain assignments to use WRITE_ONCE() as required
by memory-barriers.txt, OOTA is avoided:
I think this direction of inquiry is a bit misleading. There need not be any
speculative store at all:
P0(int *a, int *b, int *x, int *y) {
int r1;
int r2 = 0;
r1 = READ_ONCE(*x);
smp_rmb();
if (r1 == 1) {
r2 = *b;
}
WRITE_ONCE(*a, r2);
smp_wmb();
WRITE_ONCE(*y, 1);
}
P1(int *a, int *b, int *x, int *y) {
int r1;
int r2 = 0;
r1 = READ_ONCE(*y);
smp_rmb();
if (r1 == 1) {
r2 = *a;
}
WRITE_ONCE(*b, r2);
smp_wmb();
WRITE_ONCE(*x, 1);
}
If we want to respect something containing a control dependency to a
WRITE_ONCE() not in the body of the "if" statement, we need to make some
change to memory-barriers.txt.
I'm not sure what you denotate by *this* in "if we change this", but just to
clarify, I am not thinking of claiming that there were a (semantic) control
dependency to WRITE_ONCE(*b, r2) in this example.
There is however a data dependency from r2 = *a to WRITE_ONCE, and I would
say that there is a semantic data (not control) dependency from r1 =
READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from
*y, the value stored to *b will be different. The latter would be enough to
avoid OOTA according to the mainline LKMM, but currently this semantic
dependency is not detected by herd7.
According to LKMM, address and data dependencies must be headed by
rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst.
Therefore, there is nothing to chain the control dependency with.
Note that herd7 does generate dependencies. And speaking informally,
there clearly is a semantic dependency.
Both the original formalization of LKMM and my patch do say that a plain
load at the head of a dependency chain does not provide any dependency
ordering, i.e.,
[Plain & R] ; dep
is never part of hb, both in LKMM and in my patch.
By the way, if your concern is the dependency *starting* from the plain
load, then we can look at examples where the dependency starts from a
marked load:
r1 = READ_ONCE(*x);
smp_rmb();
if (r1 == 1) {
r2 = READ_ONCE(*a);
}
*b = 1;
smp_wmb();
WRITE_ONCE(*y,1);
This is more or less analogous to the case of the addr ; [Plain] ; wmb
case you already have.
I currently can not come up with an example where there would be a
(semantic) control dependency from a load to a store that is not in the arm
of an if statement (or a loop / switch of some form with the branch
depending on the load).
I think the control dependency is just a red herring. It is only there to
avoid the data race.
Well, that red herring needs to have a companion fish to swim with in
order to enforce ordering, and I am not seeing that companion.
Or am I (yet again!) missing something subtle here?
It makes more sense to think about how people do message passing (or
seqlock), which might look something like this:
[READ_ONCE]
rmb
[plain read]
and
[plain write]
wmb
[WRITE_ONCE]
Clearly LKMM says that there is some sort of order (not quite
happens-before order though) between the READ_ONCE and the plain read,
and between the plain write and the WRITE_ONCE. This order is clearly
defined in the data race definition, in r-pre-bounded and w-post-bounded.
Now consider
[READ_ONCE]
rmb
[plain read]
// some code that creates order between the plain accesses
[plain write]
wmb
[WRITE_ONCE]
where for some specific reason we can discern that the compiler can not
fully eliminate/move across the barrier either this specific plain read,
nor the plain write, nor the ordering between the two.
In this case, is there order between the READ_ONCE and the WRITE_ONCE,
or not? Of course, we know current LKMM says no. I would say that in
those very specific cases, we do have ordering.
In a hypothetical LKMM where reading in a race is not a data race unless the
data is used (*1), this would also work:
You lost me on the "(*1)", which might mean that I am misunderstanding
your text and examples below.
This was meant to be a footnote :D
unsigned int r1;
unsigned int r2 = 0;
r1 = READ_ONCE(*x);
smp_rmb();
r2 = *b;
This load from *b does not head any sort of dependency per LKMM, as noted
in rcu_dereference.rst. As that document states, there are too many games
that compilers are permitted to play with plain C-language loads.
WRITE_ONCE(*a, (~r1 + 1) & r2);
smp_wmb();
WRITE_ONCE(*y, 1);
Here in case r1 == 0, the value of r2 is not used, so there is a race but
there would not be data race in the hypothetical LKMM.
That plain C-language load from b, if concurrent with any sort of store to
b, really is a data race. Sure, a compiler that can prove that r1==0 at
the WRITE_ONCE() to a might optimize that load away, but the C-language
definition of data race still applies.
It is a data race according to C, but so are all races on WRITE_ONCE and
READ_ONCE, so we already do not actually care what C says.
What we care about is what the compiler says (and does).
The reality is that no matter what kind of crazy optimizations the
compiler does to the load and to the concurrent store, all that would
happen is that the load "returns" some insane value. But that insane
value is not used by the remainder of the computation.
I think the right way to think about it is that a race between a read
and a write gives the read an indeterminate value, and a race between
two writes produces undefined behavior. I vaguely recall that this is
even guaranteed by LLVM.
That is why sequence locks work, after all. In our internal memory model
we have relaxed the definition accordingly and there are a bunch of
internally used datastructures that can only be verified because of the
relaxation.
I am currently not at all comfortable with the thought of allowing
plain C-language loads to head any sort of dependency. I really did put
that restriction into both memory-barriers.txt and rcu_dereference.rst
intentionally. There is the old saying "Discipline = freedom", and
therefore compilers' lack of discipline surrounding plain C-language
loads implies a lack of freedom. ;-)
Yes, I understand your concern (or more generally, the concern of
letting plain accesses play a role in ordering).
Obviously, allowing arbitrary plain loads to invoke some kind of
ordering because of a dependency is plain (heh) wrong.
There are two kinds of potential problems:
- the load or its dependent store may not exist in that location at all
- the dependency may not really exist
The second case is a problem also with marked accesses, and should be
handled by herd7 only giving us actual semantic dependencies (whatever
those are). It can not be solved in cat. Either way it is a limitation
that herd7 (and also other tools) currently has and we already live with.
So the new problem we deal with is to somehow restrict the rule to loads
and dependent stores that the compiler for whatever reason will not
fully eliminate.
This problem too can not be solved completely inside cat. We can give an
approximation, as discussed with Alan (stating that a store would not be
elided if it is read by another thread, and a read would not be elided
if it reads from another thread and a store that won't be elided depends
on it).
This approximation is also limited, e.g., if the addresses of the plain
loads and stores have not yet escaped the function, but at least this
scenario is currently impossible in herd7 (unlike the fake dependency
scenario).
In my mind it would again be better to offload the correct selection of
"compiler-un(re)movable plain loads and stores" to the tools. That may
again not solve the problem fully, but it at least would mean that any
changes to address the imprecision wouldn't need to go through the
kernel tree, and IMHO it is easier to say LKMM in the cat files is the
model, and the interpretation of the model has some limitations.
We should decide which of these examples should be
added to the github litmus archive, perhaps to illustrate the fact that
plain C-language loads do not head dependency chains. Thoughts?
I'm not sure that is a good idea, given that running the herd7 tool on
the litmus test will clearly show a dependency chain headed by plain
loads in the visual graphs (with `doshow rwdep`).
Maybe it makes more sense to say in the docs that they may head
syntactic or semantic dependency chains, but because of the common case
that the compiler may cruelly optimize things, LKMM does not guarantee
ordering based on the dependency chains headed by plain loads. That
would be consistent with the tooling.
[1] https://people.kernel.org/paulmck/hunting-a-tree03-heisenbug
Fun. Thanks :) Duplication and Devious both start with D.
Best wishes
jonas