Re: [PATCH 2/2] tools/memory-model: Add write ordering by release-acquire and by locks
From: Andrea Parri
Date: Fri Jun 22 2018 - 05:07:01 EST
On Thu, Jun 21, 2018 at 01:27:12PM -0400, Alan Stern wrote:
> More than one kernel developer has expressed the opinion that the LKMM
> should enforce ordering of writes by release-acquire chains and by
> locking. In other words, given the following code:
>
> WRITE_ONCE(x, 1);
> spin_unlock(&s):
> spin_lock(&s);
> WRITE_ONCE(y, 1);
>
> or the following:
>
> smp_store_release(&x, 1);
> r1 = smp_load_acquire(&x); // r1 = 1
> WRITE_ONCE(y, 1);
>
> the stores to x and y should be propagated in order to all other CPUs,
> even though those other CPUs might not access the lock s or be part of
> the release-acquire chain. In terms of the memory model, this means
> that rel-rf-acq-po should be part of the cumul-fence relation.
>
> All the architectures supported by the Linux kernel (including RISC-V)
> do behave this way, albeit for varying reasons. Therefore this patch
> changes the model in accordance with the developers' wishes.
>
> Signed-off-by: Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>
This patch changes the "Result" for ISA2+pooncelock+pooncelock+pombonce,
so it should update the corresponding comment/README.
Reviewed-and-Tested-by: Andrea Parri <andrea.parri@xxxxxxxxxxxxxxxxxxxx>
Andrea
>
> ---
>
>
> [as1871]
>
>
> tools/memory-model/Documentation/explanation.txt | 81 +++++++++++++++++++++++
> tools/memory-model/linux-kernel.cat | 2
> 2 files changed, 82 insertions(+), 1 deletion(-)
>
> Index: usb-4.x/tools/memory-model/linux-kernel.cat
> ===================================================================
> --- usb-4.x.orig/tools/memory-model/linux-kernel.cat
> +++ usb-4.x/tools/memory-model/linux-kernel.cat
> @@ -66,7 +66,7 @@ let ppo = to-r | to-w | fence
>
> (* Propagation: Ordering from release operations and strong fences. *)
> let A-cumul(r) = rfe? ; r
> -let cumul-fence = A-cumul(strong-fence | po-rel) | wmb
> +let cumul-fence = A-cumul(strong-fence | po-rel) | wmb | rel-rf-acq-po
> let prop = (overwrite & ext)? ; cumul-fence* ; rfe?
>
> (*
> Index: usb-4.x/tools/memory-model/Documentation/explanation.txt
> ===================================================================
> --- usb-4.x.orig/tools/memory-model/Documentation/explanation.txt
> +++ usb-4.x/tools/memory-model/Documentation/explanation.txt
> @@ -1897,3 +1897,84 @@ non-deadlocking executions. For example
> Is it possible to end up with r0 = 36 at the end? The LKMM will tell
> you it is not, but the model won't mention that this is because P1
> will self-deadlock in the executions where it stores 36 in y.
> +
> +In the LKMM, locks and release-acquire chains cause stores to
> +propagate in order. For example:
> +
> + int x, y, z;
> +
> + P0()
> + {
> + WRITE_ONCE(x, 1);
> + smp_store_release(&y, 1);
> + }
> +
> + P1()
> + {
> + int r1;
> +
> + r1 = smp_load_acquire(&y);
> + WRITE_ONCE(z, 1);
> + }
> +
> + P2()
> + {
> + int r2, r3, r4;
> +
> + r2 = READ_ONCE(z);
> + smp_rmb();
> + r3 = READ_ONCE(x);
> + r4 = READ_ONCE(y);
> + }
> +
> +If r1 = 1 and r2 = 1 at the end, then both r3 and r4 must also be 1.
> +In other words, the smp_store_release() read by the smp_load_acquire()
> +together act as a sort of inter-processor fence, forcing the stores to
> +x and y to propagate to P2 before the store to z does, regardless of
> +the fact that P2 doesn't execute any release or acquire instructions.
> +This conclusion would hold even if P0 and P1 were on the same CPU, so
> +long as r1 = 1.
> +
> +We have mentioned that the LKMM treats locks as acquires and unlocks
> +as releases. Therefore it should not be surprising that something
> +analogous to this ordering also holds for locks:
> +
> + int x, y;
> + spinlock_t s;
> +
> + P0()
> + {
> + spin_lock(&s);
> + WRITE_ONCE(x, 1);
> + spin_unlock(&s);
> + }
> +
> + P1()
> + {
> + int r1;
> +
> + spin_lock(&s);
> + r1 = READ_ONCE(x):
> + WRITE_ONCE(y, 1);
> + spin_unlock(&s);
> + }
> +
> + P2()
> + {
> + int r2, r3;
> +
> + r2 = READ_ONCE(y);
> + smp_rmb();
> + r3 = READ_ONCE(x);
> + }
> +
> +If r1 = 1 at the end (implying that P1's critical section executes
> +after P0's) and r2 = 1, then r3 must be 1; the ordering of the
> +critical sections forces the store to x to propagate to P2 before the
> +store to y does.
> +
> +In both versions of this scenario, the store-propagation ordering is
> +not required by the operational model. However, it does happen on all
> +the architectures supporting the Linux kernel, and kernel developers
> +seem to expect it; they have requested that this behavior be included
> +in the LKMM.
>