On Fri, Jan 10, 2025 at 05:21:59PM +0100, Jonas Oberhauser wrote:
Am 1/10/2025 um 3:54 PM schrieb Paul E. McKenney:
On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote:
Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney:
On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote:
If we want to respect something containing a control dependency to a
Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney:
If I change the two plain assignments to use WRITE_ONCE() as required
by memory-barriers.txt, OOTA is avoided:
I think this direction of inquiry is a bit misleading. There need not be any
speculative store at all:
P0(int *a, int *b, int *x, int *y) {
int r1;
int r2 = 0;
r1 = READ_ONCE(*x);
smp_rmb();
if (r1 == 1) {
r2 = *b;
}
WRITE_ONCE(*a, r2);
smp_wmb();
WRITE_ONCE(*y, 1);
}
P1(int *a, int *b, int *x, int *y) {
int r1;
int r2 = 0;
r1 = READ_ONCE(*y);
smp_rmb();
if (r1 == 1) {
r2 = *a;
}
WRITE_ONCE(*b, r2);
smp_wmb();
WRITE_ONCE(*x, 1);
}
WRITE_ONCE() not in the body of the "if" statement, we need to make some
change to memory-barriers.txt.
I'm not sure what you denotate by *this* in "if we change this", but just to
clarify, I am not thinking of claiming that there were a (semantic) control
dependency to WRITE_ONCE(*b, r2) in this example.
There is however a data dependency from r2 = *a to WRITE_ONCE, and I would
say that there is a semantic data (not control) dependency from r1 =
READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from
*y, the value stored to *b will be different. The latter would be enough to
avoid OOTA according to the mainline LKMM, but currently this semantic
dependency is not detected by herd7.
According to LKMM, address and data dependencies must be headed by
rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst.
Therefore, there is nothing to chain the control dependency with.
Note that herd7 does generate dependencies. And speaking informally, there
clearly is a semantic dependency.
Both the original formalization of LKMM and my patch do say that a plain
load at the head of a dependency chain does not provide any dependency
ordering, i.e.,
[Plain & R] ; dep
is never part of hb, both in LKMM and in my patch.
Agreed, LKMM does filter the underlying herd7 dependencies.
By the way, if your concern is the dependency *starting* from the plain
load, then we can look at examples where the dependency starts from a marked
load:
r1 = READ_ONCE(*x);
smp_rmb();
if (r1 == 1) {
r2 = READ_ONCE(*a);
}
*b = 1;
smp_wmb();
WRITE_ONCE(*y,1);
This is more or less analogous to the case of the addr ; [Plain] ; wmb case
you already have.
This is probably a failure of imagination on my part, but I am not
seeing how to create another thread that interacts with that store to
"b" without resulting in a data race.
Ignoring that, I am not seeing much in the way of LKMM dependencies
in that code.
That is why sequence locks work, after all. In our internal memory model we
have relaxed the definition accordingly and there are a bunch of internally
used datastructures that can only be verified because of the relaxation.
Again, it is likely best to model sequence locking directly.
I am currently not at all comfortable with the thought of allowing
plain C-language loads to head any sort of dependency. I really did put
that restriction into both memory-barriers.txt and rcu_dereference.rst
intentionally. There is the old saying "Discipline = freedom", and
therefore compilers' lack of discipline surrounding plain C-language
loads implies a lack of freedom. ;-)
Yes, I understand your concern (or more generally, the concern of letting
plain accesses play a role in ordering).
Obviously, allowing arbitrary plain loads to invoke some kind of ordering
because of a dependency is plain (heh) wrong.
There are two kinds of potential problems:
- the load or its dependent store may not exist in that location at all
- the dependency may not really exist
The second case is a problem also with marked accesses, and should be
handled by herd7 only giving us actual semantic dependencies (whatever those
are). It can not be solved in cat. Either way it is a limitation that herd7
(and also other tools) currently has and we already live with.
This is easy to say. Please see this paper (which Alan also referred to)
for some of the challenges:
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3064r2.pdf
Mark Batty and his group have identified more gotchas. (See the citation
in the above paper.)
So the new problem we deal with is to somehow restrict the rule to loads and
dependent stores that the compiler for whatever reason will not fully
eliminate.
This problem too can not be solved completely inside cat. We can give an
approximation, as discussed with Alan (stating that a store would not be
elided if it is read by another thread, and a read would not be elided if it
reads from another thread and a store that won't be elided depends on it).
This approximation is also limited, e.g., if the addresses of the plain
loads and stores have not yet escaped the function, but at least this
scenario is currently impossible in herd7 (unlike the fake dependency
scenario).
In my mind it would again be better to offload the correct selection of
"compiler-un(re)movable plain loads and stores" to the tools. That may again
not solve the problem fully, but it at least would mean that any changes to
address the imprecision wouldn't need to go through the kernel tree, and
IMHO it is easier to say LKMM in the cat files is the model, and the
interpretation of the model has some limitations.
Which we currently do via marked accesses. ;-)
We should decide which of these examples should be
added to the github litmus archive, perhaps to illustrate the fact that
plain C-language loads do not head dependency chains. Thoughts?
I'm not sure that is a good idea, given that running the herd7 tool on the
litmus test will clearly show a dependency chain headed by plain loads in
the visual graphs (with `doshow rwdep`).
Maybe it makes more sense to say in the docs that they may head syntactic or
semantic dependency chains, but because of the common case that the compiler
may cruelly optimize things, LKMM does not guarantee ordering based on the
dependency chains headed by plain loads. That would be consistent with the
tooling.
Well, LKMM and herd7 have different notions of what constitutes a
dependency, and the LKMM notion is most relevant here. (And herd7
needs its broader definition so as to handle a wide variety of
memory models.)