Re: [RFC] tools/memory-model: Rule out OOTA

From: Jonas Oberhauser
Date: Fri Jan 10 2025 - 11:22:56 EST




Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney:
On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote:
Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney:
On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote:


Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney:
If I change the two plain assignments to use WRITE_ONCE() as required
by memory-barriers.txt, OOTA is avoided:


I think this direction of inquiry is a bit misleading. There need not be any
speculative store at all:



P0(int *a, int *b, int *x, int *y) {
int r1;
int r2 = 0;
r1 = READ_ONCE(*x);
smp_rmb();
if (r1 == 1) {
r2 = *b;
}
WRITE_ONCE(*a, r2);
smp_wmb();
WRITE_ONCE(*y, 1);
}

P1(int *a, int *b, int *x, int *y) {
int r1;

int r2 = 0;

r1 = READ_ONCE(*y);
smp_rmb();
if (r1 == 1) {
r2 = *a;
}
WRITE_ONCE(*b, r2);
smp_wmb();
WRITE_ONCE(*x, 1);
}


If we want to respect something containing a control dependency to a
WRITE_ONCE() not in the body of the "if" statement, we need to make some
change to memory-barriers.txt.

I'm not sure what you denotate by *this* in "if we change this", but just to
clarify, I am not thinking of claiming that there were a (semantic) control
dependency to WRITE_ONCE(*b, r2) in this example.

There is however a data dependency from r2 = *a to WRITE_ONCE, and I would
say that there is a semantic data (not control) dependency from r1 =
READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from
*y, the value stored to *b will be different. The latter would be enough to
avoid OOTA according to the mainline LKMM, but currently this semantic
dependency is not detected by herd7.

According to LKMM, address and data dependencies must be headed by
rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst.

Therefore, there is nothing to chain the control dependency with.

Note that herd7 does generate dependencies. And speaking informally, there clearly is a semantic dependency.

Both the original formalization of LKMM and my patch do say that a plain load at the head of a dependency chain does not provide any dependency ordering, i.e.,

[Plain & R] ; dep

is never part of hb, both in LKMM and in my patch.


By the way, if your concern is the dependency *starting* from the plain load, then we can look at examples where the dependency starts from a marked load:


r1 = READ_ONCE(*x);
smp_rmb();
if (r1 == 1) {
r2 = READ_ONCE(*a);
}
*b = 1;
smp_wmb();
WRITE_ONCE(*y,1);

This is more or less analogous to the case of the addr ; [Plain] ; wmb case you already have.


I currently can not come up with an example where there would be a
(semantic) control dependency from a load to a store that is not in the arm
of an if statement (or a loop / switch of some form with the branch
depending on the load).

I think the control dependency is just a red herring. It is only there to
avoid the data race.

Well, that red herring needs to have a companion fish to swim with in
order to enforce ordering, and I am not seeing that companion.

Or am I (yet again!) missing something subtle here?



It makes more sense to think about how people do message passing (or seqlock), which might look something like this:

[READ_ONCE]
rmb
[plain read]

and

[plain write]
wmb
[WRITE_ONCE]


Clearly LKMM says that there is some sort of order (not quite happens-before order though) between the READ_ONCE and the plain read, and between the plain write and the WRITE_ONCE. This order is clearly defined in the data race definition, in r-pre-bounded and w-post-bounded.

Now consider

[READ_ONCE]
rmb
[plain read]
// some code that creates order between the plain accesses
[plain write]
wmb
[WRITE_ONCE]

where for some specific reason we can discern that the compiler can not fully eliminate/move across the barrier either this specific plain read, nor the plain write, nor the ordering between the two.

In this case, is there order between the READ_ONCE and the WRITE_ONCE, or not? Of course, we know current LKMM says no. I would say that in those very specific cases, we do have ordering.


In a hypothetical LKMM where reading in a race is not a data race unless the
data is used (*1), this would also work:

You lost me on the "(*1)", which might mean that I am misunderstanding
your text and examples below.

This was meant to be a footnote :D

unsigned int r1;
unsigned int r2 = 0;
r1 = READ_ONCE(*x);
smp_rmb();
r2 = *b;

This load from *b does not head any sort of dependency per LKMM, as noted
in rcu_dereference.rst. As that document states, there are too many games
that compilers are permitted to play with plain C-language loads.

WRITE_ONCE(*a, (~r1 + 1) & r2);
smp_wmb();
WRITE_ONCE(*y, 1);


Here in case r1 == 0, the value of r2 is not used, so there is a race but
there would not be data race in the hypothetical LKMM.

That plain C-language load from b, if concurrent with any sort of store to
b, really is a data race. Sure, a compiler that can prove that r1==0 at
the WRITE_ONCE() to a might optimize that load away, but the C-language
definition of data race still applies.

It is a data race according to C, but so are all races on WRITE_ONCE and READ_ONCE, so we already do not actually care what C says.

What we care about is what the compiler says (and does).

The reality is that no matter what kind of crazy optimizations the compiler does to the load and to the concurrent store, all that would happen is that the load "returns" some insane value. But that insane value is not used by the remainder of the computation.

I think the right way to think about it is that a race between a read and a write gives the read an indeterminate value, and a race between two writes produces undefined behavior. I vaguely recall that this is even guaranteed by LLVM.

That is why sequence locks work, after all. In our internal memory model we have relaxed the definition accordingly and there are a bunch of internally used datastructures that can only be verified because of the relaxation.



I am currently not at all comfortable with the thought of allowing
plain C-language loads to head any sort of dependency. I really did put
that restriction into both memory-barriers.txt and rcu_dereference.rst
intentionally. There is the old saying "Discipline = freedom", and
therefore compilers' lack of discipline surrounding plain C-language
loads implies a lack of freedom. ;-)

Yes, I understand your concern (or more generally, the concern of letting plain accesses play a role in ordering).
Obviously, allowing arbitrary plain loads to invoke some kind of ordering because of a dependency is plain (heh) wrong.
There are two kinds of potential problems:
- the load or its dependent store may not exist in that location at all
- the dependency may not really exist

The second case is a problem also with marked accesses, and should be handled by herd7 only giving us actual semantic dependencies (whatever those are). It can not be solved in cat. Either way it is a limitation that herd7 (and also other tools) currently has and we already live with.

So the new problem we deal with is to somehow restrict the rule to loads and dependent stores that the compiler for whatever reason will not fully eliminate.

This problem too can not be solved completely inside cat. We can give an approximation, as discussed with Alan (stating that a store would not be elided if it is read by another thread, and a read would not be elided if it reads from another thread and a store that won't be elided depends on it).

This approximation is also limited, e.g., if the addresses of the plain loads and stores have not yet escaped the function, but at least this scenario is currently impossible in herd7 (unlike the fake dependency scenario).

In my mind it would again be better to offload the correct selection of "compiler-un(re)movable plain loads and stores" to the tools. That may again not solve the problem fully, but it at least would mean that any changes to address the imprecision wouldn't need to go through the kernel tree, and IMHO it is easier to say LKMM in the cat files is the model, and the interpretation of the model has some limitations.





We should decide which of these examples should be
added to the github litmus archive, perhaps to illustrate the fact that
plain C-language loads do not head dependency chains. Thoughts?

I'm not sure that is a good idea, given that running the herd7 tool on the litmus test will clearly show a dependency chain headed by plain loads in the visual graphs (with `doshow rwdep`).

Maybe it makes more sense to say in the docs that they may head syntactic or semantic dependency chains, but because of the common case that the compiler may cruelly optimize things, LKMM does not guarantee ordering based on the dependency chains headed by plain loads. That would be consistent with the tooling.

[1] https://people.kernel.org/paulmck/hunting-a-tree03-heisenbug

Fun. Thanks :) Duplication and Devious both start with D.

Best wishes
jonas