Re: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models"

From: Boqun Feng
Date: Fri Aug 26 2022 - 12:22:08 EST

Next message: Sean Anderson: "retire_capture_urb: Corrected urb data len"
Previous message: Arun.Ramadoss: "Re: [RFC Patch net-next v2] net: dsa: microchip: lan937x: enable interrupt for internal phy link detection"
In reply to: Paul E. McKenney: ""Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models""
Next in thread: Peter Zijlstra: "Re: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Aug 26, 2022 at 05:48:12AM -0700, Paul E. McKenney wrote:
> Hello!
>
> I have not yet done more than glance at this one, but figured I should
> send it along sooner rather than later.
>
> "Verifying and Optimizing Compact NUMA-Aware Locks on Weak
> Memory Models", Antonio Paolillo, Hernán Ponce-de-León, Thomas
> Haas, Diogo Behrens, Rafael Chehab, Ming Fu, and Roland Meyer.
> https://arxiv.org/abs/2111.15240
>
> The claim is that the queued spinlocks implementation with CNA violates
> LKMM but actually works on all architectures having a formal hardware
> memory model.
>

Translate one of their litmus test into a runnable one (there is a typo
in it):

C queued-spin-lock

(*
* P0 is lock-release
* P1 is xchg_tail()
* P2 is lock-acquire
*)

{}

P0(int *x, atomic_t *l)
{
WRITE_ONCE(*x, 1);
atomic_set_release(l, 1);
}

P1(int *x, atomic_t *l)
{
int val;
int new;
int old;

val = atomic_read(l);
new = val + 2;
old = atomic_cmpxchg_relaxed(l, val, new);
}

P2(int *x, atomic_t *l)
{
int r0 = atomic_read_acquire(l);
int r1 = READ_ONCE(*x);
}

exists (2:r0=3 /\ 2:r1=0)

According to LKMM, the exist-clause could be triggered because:

po-rel; coe: rfe; acq-po

is not happen-before in LKMM. Alan actually explain why at a response to
a GitHub issue:

https://github.com/paulmckrcu/litmus/issues/11#issuecomment-1115235860

(Paste Alan's reply)

"""
As for why the LKMM doesn't require ordering for this test... I do not
believe this is related to 2+2W. Think instead in terms of the LKMM's
operational model:

The store-release in P0 means that the x=1 write will propagate
to each CPU before the y=1 write does.

Since y=3 at the end, we know that y=1 (and hence x=1 too)
propagates to P1 before the addition occurs. And we know that
y=3 propagates to P2 before the load-acquire executes.

But we _don't_ know that either y=1 or x=1 propagates to P2
before y=3 does! If the store in P1 were a store-release then
this would have to be true (as you saw in your tests), but it
isn't.

In other words, the litmus test could execute with the following
temporal ordering:

P0 P1 P2
---------- --------- ----------
Write x=1
Write y=1

[x=1 and y=1 propagate to P1]

Read y=1
Write y=3

[y=3 propagates to P2]

Read y=3
Read x=0

[x=1 and y=1 propagate to P2]

(Note that when y=1 propagates to P2, it doesn't do anything because it
won't overwrite the coherence-later store y=3.)

It may be true that none of the architectures supported by Linux will
allow this outcome for the test (although I suspect that the PPC-weird
version _would_ be allowed if you fixed it). At any rate, disallowing
this result in the memory model would probably require more effort than
would be worthwhile.

Alan
"""

The question is that whether we "fix" LKMM because of this, or we
mention explicitly this is something "unsupported" by LKMM yet?

Regards,
Boqun

> Thoughts?
>
> Thanx, Paul

Next message: Sean Anderson: "retire_capture_urb: Corrected urb data len"
Previous message: Arun.Ramadoss: "Re: [RFC Patch net-next v2] net: dsa: microchip: lan937x: enable interrupt for internal phy link detection"
In reply to: Paul E. McKenney: ""Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models""
Next in thread: Peter Zijlstra: "Re: "Verifying and Optimizing Compact NUMA-Aware Locks on Weak Memory Models""
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]