Re: [tip:locking/core] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

From: Boqun Feng
Date: Sat Sep 11 2021 - 20:28:37 EST


On Fri, Sep 10, 2021 at 12:36:32PM -0400, Alan Stern wrote:
> On Fri, Sep 10, 2021 at 10:20:13PM +0800, Boqun Feng wrote:
> > On Thu, Sep 09, 2021 at 11:00:05AM -0700, Paul E. McKenney wrote:
> > [...]
> > >
> > > Boqun, I vaguely remember a suggested change from you along these lines,
> > > but now I cannot find it. Could you please send it as a formal patch
> > > if you have not already done so or point me at it if you have?
> > >
> >
> > Here is a draft patch based on the change I did when I discussed with
> > Peter, and I really want to hear Alan's thought first. Ideally, we
> > should also have related litmus tests and send to linux-arch list so
> > that we know the ordering is provided by every architecture.
> >
> > Regards,
> > Boqun
> >
> > --------------------------------->8
> > Subject: [PATCH] tools/memory-model: Provide extra ordering for
> > lock-{release,acquire} on the same CPU
> >
> > A recent discussion[1] shows that we are in favor of strengthening the
> > ordering of lock-release + lock-acquire on the same CPU: a lock-release
> > and a po-after lock-acquire should provide the so-called RCtso ordering,
> > that is a memory access S po-before the lock-release should be ordered
> > against a memory access R po-after the lock-acquire, unless S is a store
> > and R is a load.
> >
> > The strengthening meets programmers' expection that "sequence of two
> > locked regions to be ordered wrt each other" (from Linus), and can
> > reduce the mental burden when using locks. Therefore add it in LKMM.
> >
> > [1]: https://lore.kernel.org/lkml/20210909185937.GA12379@xxxxxxxxxxxxxxxxxxx/
> >
> > Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
> > ---
>
> The change to linux-kernel.cat looks fine. However, I didn't like your
> update to explanation.txt. Instead I wrote my own, given below.
>

Thanks. Indeed your changes of explanation is better.

> I also wrote a couple of litmus tests which Paul can add to the
> appropriate archive. They are attached to this email. As expected,
> they fail (result Sometimes) with the current LKMM and succeed (Never)
> with Boqun's updated model.
>

Appreciate it, I will put together your change to explanation.txt (with
the typo fixed), my change to cat file and the litmus tests, and send
a proper patch next Monday.

Regards,
Boqun

> Alan
>
>
> --- usb-devel.orig/tools/memory-model/Documentation/explanation.txt
> +++ usb-devel/tools/memory-model/Documentation/explanation.txt
> @@ -1813,15 +1813,16 @@ spin_trylock() -- we can call these thin
> lock-acquires -- have two properties beyond those of ordinary releases
> and acquires.
>
> -First, when a lock-acquire reads from a lock-release, the LKMM
> -requires that every instruction po-before the lock-release must
> -execute before any instruction po-after the lock-acquire. This would
> -naturally hold if the release and acquire operations were on different
> -CPUs, but the LKMM says it holds even when they are on the same CPU.
> -For example:
> +First, when a lock-acquire reads from or is po-after a lock-release,
> +the LKMM requires that every instruction po-before the lock-release
> +must execute before any instruction po-after the lock-acquire. This
> +would naturally hold if the release and acquire operations were on
> +different CPUs and accessed the same lock variable, but the LKMM says
> +it also holds when they are on the same CPU, even if they access
> +different lock variables. For example:
>
> int x, y;
> - spinlock_t s;
> + spinlock_t s, t;
>
> P0()
> {
> @@ -1830,9 +1831,9 @@ For example:
> spin_lock(&s);
> r1 = READ_ONCE(x);
> spin_unlock(&s);
> - spin_lock(&s);
> + spin_lock(&t);
> r2 = READ_ONCE(y);
> - spin_unlock(&s);
> + spin_unlock(&t);
> }
>
> P1()
> @@ -1842,10 +1843,10 @@ For example:
> WRITE_ONCE(x, 1);
> }
>
> -Here the second spin_lock() reads from the first spin_unlock(), and
> -therefore the load of x must execute before the load of y. Thus we
> -cannot have r1 = 1 and r2 = 0 at the end (this is an instance of the
> -MP pattern).
> +Here the second spin_lock() is po-after the first spin_unlock(), and
> +therefore the load of x must execute before the load of y, even tbough
> +the two locking operations use different locks. Thus we cannot have
> +r1 = 1 and r2 = 0 at the end (this is an instance of the MP pattern).
>
> This requirement does not apply to ordinary release and acquire
> fences, only to lock-related operations. For instance, suppose P0()
> @@ -1872,13 +1873,13 @@ instructions in the following order:
>
> and thus it could load y before x, obtaining r2 = 0 and r1 = 1.
>
> -Second, when a lock-acquire reads from a lock-release, and some other
> -stores W and W' occur po-before the lock-release and po-after the
> -lock-acquire respectively, the LKMM requires that W must propagate to
> -each CPU before W' does. For example, consider:
> +Second, when a lock-acquire reads from or is po-after a lock-release,
> +and some other stores W and W' occur po-before the lock-release and
> +po-after the lock-acquire respectively, the LKMM requires that W must
> +propagate to each CPU before W' does. For example, consider:
>
> int x, y;
> - spinlock_t x;
> + spinlock_t s;
>
> P0()
> {
> @@ -1908,7 +1909,12 @@ each CPU before W' does. For example, c
>
> If r1 = 1 at the end then the spin_lock() in P1 must have read from
> the spin_unlock() in P0. Hence the store to x must propagate to P2
> -before the store to y does, so we cannot have r2 = 1 and r3 = 0.
> +before the store to y does, so we cannot have r2 = 1 and r3 = 0. But
> +if P1 had used a lock variable different from s, the writes could have
> +propagated in either order. (On the other hand, if the code in P0 and
> +P1 had all executed on a single CPU, as in the example before this
> +one, then the writes would have propagated in order even if the two
> +critical sections used different lock variables.)
>
> These two special requirements for lock-release and lock-acquire do
> not arise from the operational model. Nevertheless, kernel developers
>

> C ullk-rw
>
> (*
> * Result: Never
> *
> * If two locked critical sections execute on the same CPU, all accesses
> * in the first must execute before any accesses in the second, even if
> * the critical sections are protected by different locks.
> *)
>
> {}
>
> P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
> {
> int r1;
>
> spin_lock(s);
> r1 = READ_ONCE(*x);
> spin_unlock(s);
> spin_lock(t);
> WRITE_ONCE(*y, 1);
> spin_unlock(t);
> }
>
> P1(int *x, int *y)
> {
> int r2;
>
> r2 = smp_load_acquire(y);
> WRITE_ONCE(*x, 1);
> }
>
> exists (0:r1=1 /\ 1:r2=1)

> C ullk-ww
>
> (*
> * Result: Never
> *
> * If two locked critical sections execute on the same CPU, stores in the
> * first must propagate to each CPU before stores in the second do, even if
> * the critical sections are protected by different locks.
> *)
>
> {}
>
> P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
> {
> spin_lock(s);
> WRITE_ONCE(*x, 1);
> spin_unlock(s);
> spin_lock(t);
> WRITE_ONCE(*y, 1);
> spin_unlock(t);
> }
>
> P1(int *x, int *y)
> {
> int r1;
> int r2;
>
> r1 = READ_ONCE(*y);
> smp_rmb();
> r2 = READ_ONCE(*x);
> }
>
> exists (1:r1=1 /\ 1:r2=0)