Re: [RFC] tools/memory-model: Rule out OOTA

From: Paul E. McKenney
Date: Fri Jan 10 2025 - 09:58:31 EST


On Thu, Jan 09, 2025 at 07:35:19PM +0100, Jonas Oberhauser wrote:
> Am 1/9/2025 um 6:54 PM schrieb Paul E. McKenney:
> > On Wed, Jan 08, 2025 at 08:17:51PM +0100, Jonas Oberhauser wrote:
> > >
> > >
> > > Am 1/8/2025 um 7:09 PM schrieb Paul E. McKenney:
> > > > On Wed, Jan 08, 2025 at 06:39:12PM +0100, Jonas Oberhauser wrote:
> > > > >
> > > > >
> > > > > Am 1/7/2025 um 7:47 PM schrieb Paul E. McKenney:
> > > > > > On Tue, Jan 07, 2025 at 11:09:55AM -0500, Alan Stern wrote:
> > > > > >
> > > > > The patch removes the (actually non-existant) race condition by saying that
> > > > > a critical section that is protected from having a data race with address
> > > > > dependency or rmb/wmb (which LKMM already says works for avoiding data
> > > > > races), is in fact also ordered and therefore has no race condition either.
> > > > >
> > > > > As a side effect :), this happens to fix OOTA in general in LKMM.
> > > >
> > > > Fair point, no data race is flagged.
> > > >
> > > > On the other hand, Documentation/memory-barriers.txt says the following:
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > However, stores are not speculated. This means that ordering -is- provided
> > > > for load-store control dependencies, as in the following example:
> > > >
> > > > q = READ_ONCE(a);
> > > > if (q) {
> > > > WRITE_ONCE(b, 1);
> > > > }
> > > >
> > > > Control dependencies pair normally with other types of barriers.
> > > > That said, please note that neither READ_ONCE() nor WRITE_ONCE()
> > > > are optional! Without the READ_ONCE(), the compiler might combine the
> > > > load from 'a' with other loads from 'a'. Without the WRITE_ONCE(),
> > > > the compiler might combine the store to 'b' with other stores to 'b'.
> > > > Either can result in highly counterintuitive effects on ordering.
> > > >
> > > > ------------------------------------------------------------------------
> > > >
> > > > If I change the two plain assignments to use WRITE_ONCE() as required
> > > > by memory-barriers.txt, OOTA is avoided:
> > >
> > >
> > > I think this direction of inquiry is a bit misleading. There need not be any
> > > speculative store at all:
> > >
> > >
> > >
> > > P0(int *a, int *b, int *x, int *y) {
> > > int r1;
> > > int r2 = 0;
> > > r1 = READ_ONCE(*x);
> > > smp_rmb();
> > > if (r1 == 1) {
> > > r2 = *b;
> > > }
> > > WRITE_ONCE(*a, r2);
> > > smp_wmb();
> > > WRITE_ONCE(*y, 1);
> > > }
> > >
> > > P1(int *a, int *b, int *x, int *y) {
> > > int r1;
> > >
> > > int r2 = 0;
> > >
> > > r1 = READ_ONCE(*y);
> > > smp_rmb();
> > > if (r1 == 1) {
> > > r2 = *a;
> > > }
> > > WRITE_ONCE(*b, r2);
> > > smp_wmb();
> > > WRITE_ONCE(*x, 1);
> > > }
> > >
> > >
> > > The reason that the WRITE_ONCE helps in the speculative store case is that
> > > both its ctrl dependency and the wmb provide ordering, which together
> > > creates ordering between *x and *y.
> >
> > Ah, and that is because LKMM does not enforce control dependencies past
> > the end of the "if" statement. Cute!
> >
> > But memory-barriers.txt requires that the WRITE_ONCE() be within the
> > "if" statement for control dependencies to exist, so LKMM is in agreement
> > with memory-barriers.txt in this case. So again, if we change this,
> > we need to also change memory-barriers.txt.
> > [...]
> > If we want to respect something containing a control dependency to a
> > WRITE_ONCE() not in the body of the "if" statement, we need to make some
> > change to memory-barriers.txt.
>
> I'm not sure what you denotate by *this* in "if we change this", but just to
> clarify, I am not thinking of claiming that there were a (semantic) control
> dependency to WRITE_ONCE(*b, r2) in this example.
>
> There is however a data dependency from r2 = *a to WRITE_ONCE, and I would
> say that there is a semantic data (not control) dependency from r1 =
> READ_ONCE(*y) to WRITE_ONCE(*b, r2), too: depending on the value read from
> *y, the value stored to *b will be different. The latter would be enough to
> avoid OOTA according to the mainline LKMM, but currently this semantic
> dependency is not detected by herd7.

According to LKMM, address and data dependencies must be headed by
rcu_dereference() or similar. See Documentation/RCU/rcu_dereference.rst.

Therefore, there is nothing to chain the control dependency with.

> I currently can not come up with an example where there would be a
> (semantic) control dependency from a load to a store that is not in the arm
> of an if statement (or a loop / switch of some form with the branch
> depending on the load).
>
> I think the control dependency is just a red herring. It is only there to
> avoid the data race.

Well, that red herring needs to have a companion fish to swim with in
order to enforce ordering, and I am not seeing that companion.

Or am I (yet again!) missing something subtle here?

> In a hypothetical LKMM where reading in a race is not a data race unless the
> data is used (*1), this would also work:

You lost me on the "(*1)", which might mean that I am misunderstanding
your text and examples below.

> unsigned int r1;
> unsigned int r2 = 0;
> r1 = READ_ONCE(*x);
> smp_rmb();
> r2 = *b;

This load from *b does not head any sort of dependency per LKMM, as noted
in rcu_dereference.rst. As that document states, there are too many games
that compilers are permitted to play with plain C-language loads.

> WRITE_ONCE(*a, (~r1 + 1) & r2);
> smp_wmb();
> WRITE_ONCE(*y, 1);
>
>
> Here in case r1 == 0, the value of r2 is not used, so there is a race but
> there would not be data race in the hypothetical LKMM.

That plain C-language load from b, if concurrent with any sort of store to
b, really is a data race. Sure, a compiler that can prove that r1==0 at
the WRITE_ONCE() to a might optimize that load away, but the C-language
definition of data race still applies.

Ah, I finally see that (*1) is a footnote.

> This example would also have OOTA under such a hypothetical LKMM, but not
> with my patch, because in the case where r1 == 1,
> READ_ONCE(*x) is seperated by rmb from the load from *b,
> upon which the store to *a depends,
> which itself is seperated by a wmb from the store to WRITE_ONCE(*y,1)
> and this would ensure that READ_ONCE(*x) and WRITE_ONCE(*y,1) can not be
> reordered with each other anymore.
>
>
> (*1= such a definition is not absurd! One needs to allow such races to make
> sequence locks and other similar datastructures well-defined.)

I am currently not at all comfortable with the thought of allowing
plain C-language loads to head any sort of dependency. I really did put
that restriction into both memory-barriers.txt and rcu_dereference.rst
intentionally. There is the old saying "Discipline = freedom", and
therefore compilers' lack of discipline surrounding plain C-language
loads implies a lack of freedom. ;-)

> I currently don't know another way than the if-statement to avoid the data
> race in the program(*2) in the current LKMM, so that's why I rely on it, but
> at least conceptually it is orthogonal to the problem.
>
> (*2=we can avoid the data race flag in herd by using filter, and only
> generating the graphs where r1==1 and there is no data race. But that is
> cheating -- the program is not valid under mainline LKMM.)

Such cheats can be valid in cases where that is how you tell herd7 about
some restriction that it cannot be told otherwise, but in this case, I
agree that this cheat is unhelpful.

> > > Please do look at the OO
> TA graph generated by herd7 for this one, it looks
> > > quite amazing.
> >
> > Given the way this morning is going, I must take your word for it...
>
> That sounds awful :(
> Technical issues?

Nothing awful, just catching up from tracking down a Linux-kernel RCU
bug that fought well [1] and from the holidays.

> With any luck, you can test it on arm's herd7 web interface at
> https://developer.arm.com/herd7 (just don't be like me and type all the code
> first and then change the drop-down selector to Linux - that will reset the
> code window...)

Ah, I have been running them locally, and didn't have time to chase down
the herd7 arguments.

Which reminds me... We should decide which of these examples should be
added to the github litmus archive, perhaps to illustrate the fact that
plain C-language loads do not head dependency chains. Thoughts?

Thanx, Paul

[1] https://people.kernel.org/paulmck/hunting-a-tree03-heisenbug