Re: Documentation for plain accesses and data races

From: Boqun Feng
Date: Mon Sep 16 2019 - 01:18:05 EST


Hi Alan,

I spend some time reading this, really helpful! Thanks.

Please see comments below:

On Fri, Sep 06, 2019 at 02:11:29PM -0400, Alan Stern wrote:
[...]
> If two memory accesses aren't concurrent then one must execute before
> the other. Therefore the LKMM decides two accesses aren't concurrent
> if they can be connected by a sequence of hb, pb, and rb links
> (together referred to as xb, for "executes before"). However, there
> are two complicating factors.
>
> If X is a load and X executes before a store Y, then indeed there is
> no danger of X and Y being concurrent. After all, Y can't have any
> effect on the value obtained by X until the memory subsystem has
> propagated Y from its own CPU to X's CPU, which won't happen until
> some time after Y executes and thus after X executes. But if X is a
> store, then even if X executes before Y it is still possible that X
> will propagate to Y's CPU just as Y is executing. In such a case X
> could very well interfere somehow with Y, and we would have to
> consider X and Y to be concurrent.
>
> Therefore when X is a store, for X and Y to be non-concurrent the LKMM
> requires not only that X must execute before Y but also that X must
> propagate to Y's CPU before Y executes. (Or vice versa, of course, if
> Y executes before X -- then Y must propagate to X's CPU before X
> executes if Y is a store.) This is expressed by the visibility
> relation (vis), where X ->vis Y is defined to hold if there is an
> intermediate event Z such that:
>
> X is connected to Z by a possibly empty sequence of
> cumul-fence links followed by an optional rfe link (if none of
> these links are present, X and Z are the same event),
>

I wonder whehter we could add an optional ->coe or ->fre between X and
the possibly empty sequence of cumul-fence, smiliar to the definition
of ->prop. This makes sense because if we have

X ->coe X' (and X' in on CPU C)

, X must be already propagated to C before X' executed, according to our
operation model:

"... In particular, it must arrange for the store to be co-later than
(i.e., to overwrite) any other store to the same location which has
already propagated to CPU C."

In other words, we can define ->vis as:

let vis = prop ; ((strong-fence ; [Marked] ; xbstar) | (xbstar & int))

, and for this document, reference the "prop" section in
explanation.txt. This could make the model simple (both for description
and explanation), and one better thing is that we won't need commit in
Paul's dev branch:

c683f2c807d2 "tools/memory-model: Fix data race detection for unordered store and load"

, and data race rules will look more symmetrical ;-)

Thoughts? Or am I missing something subtle here?

Regards,
Boqun

> and either:
>
> Z is connected to Y by a strong-fence link followed by a
> possibly empty sequence of xb links,
>
> or:
>
> Z is on the same CPU as Y and is connected to Y by a possibly
> empty sequence of xb links (again, if the sequence is empty it
> means Z and Y are the same event).
>
> The motivations behind this definition are straightforward:
>
> cumul-fence memory barriers force stores that are po-before
> the barrier to propagate to other CPUs before stores that are
> po-after the barrier.
>
> An rfe link from an event W to an event R says that R reads
> from W, which certainly means that W must have propagated to
> R's CPU before R executed.
>
> strong-fence memory barriers force stores that are po-before
> the barrier, or that propagate to the barrier's CPU before the
> barrier executes, to propagate to all CPUs before any events
> po-after the barrier can execute.
>
> To see how this works out in practice, consider our old friend, the MP
> pattern (with fences and statement labels, but without the conditional
> test):
>
> int buf = 0, flag = 0;
>
> P0()
> {
> X: WRITE_ONCE(buf, 1);
> smp_wmb();
> W: WRITE_ONCE(flag, 1);
> }
>
> P1()
> {
> int r1;
> int r2 = 0;
>
> Z: r1 = READ_ONCE(flag);
> smp_rmb();
> Y: r2 = READ_ONCE(buf);
> }
>
> The smp_wmb() memory barrier gives a cumul-fence link from X to W, and
> assuming r1 = 1 at the end, there is an rfe link from W to Z. This
> means that the store to buf must propagate from P0 to P1 before Z
> executes. Next, Z and Y are on the same CPU and the smp_rmb() fence
> provides an xb link from Z to Y (i.e., it forces Z to execute before
> Y). Therefore we have X ->vis Y: X must propagate to Y's CPU before Y
> executes.
>
[...]

Attachment: signature.asc
Description: PGP signature