### Concurrency with tools/memory-model Andrea Parri andrea.parri@amarulasolutions.com Kernel Summit 2018 ## ... (part of) the LKMM subsystem - Merged in 4.17 - ullet $\sim$ 5000 LoC and documentation - 10 maintainers, 2 reviewers #### **Motivations** #### Test it, stupid! #### Read the fine manual! ``` #define __smp_store_release(p, v) do ſ union { typeof(*p) __val; char __c[1]; } __u = { .__val = (__force typeof(*p)) (v) }; compiletime_assert_atomic_type(*p); switch (sizeof(*p)) { [...] case 4: asm volatile ("stlr %w1. %0" : "=Q" (*p) : "r" (*( u32 *) u. c) : "memory"); break; [...] } while (0) ``` ## Are you for real?? (and gut feelings...) - Joe Random Developer is likely not going to review 10+ implementations of smp\_store\_release() - Architectures' maintainers are likely not going to review Joe's patches about his cool new feature X #### The LKMM as an intermediary The purpose of this document is twofold: - to specify the minimum functionality that one can rely on for any particular barrier, and - (2) to provide a guide as to how to use the barriers that are available. (from Documentation/memory-barriers.txt) # Basic usage ## Litmus tests, aka querying the memory model #### Basic usage: reachable states ``` $ herd7 -conf linux-kernel.cfg producer-consumer.litmus Test producer-consumer Allowed States 2 1:r_data=-1; 1:r_flag=0; 1:r_data=1; 1:r_flag=1; No Witnesses Positive: 0 Negative: 2 Condition exists (1:r_flag=1 /\ 1:r_data=0) [...] ``` ## The LKMM as a formal specification This memory model can (roughly speaking) be thought of as an automated version of memory-barriers.txt. It is written in the "cat" language, which is executable by the externally provided "herd7" simulator [...] Paul E. McKenney #### LIMITATIONS #### \_\_\_\_\_ - [...] but there is [...] code that uses bare C memory accesses [...] this [...] in turn limits LKMM's ability to accurately model address, control, and data dependencies. - Multiple access sizes for a single variable are not supported and neither are misaligned or partially overlapping accesses. - 3. Exceptions and interrupts are not modeled. [...] - 4. I/O such as MMIO or DMA is not supported. - 5. Self-modifying code [...] is not supported. - 6. Complete modeling of all variants of atomic read-modify-write operations, locking primitives, and RCU is not provided. [...] # **Examples** #### **Coherence** This 'exists' clause can NOT be satisfied! #### **Execution and propagation (release and acquire)** This 'exists' clause can NOT be satisfied! ## **Cumulativity** ``` C release-is-(A-)cumulative P2(int *x, int *y) int r0; int x = 0: int r1: int y = 0; r0 = smp_load_acquire(y); r1 = READ_ONCE(*x); P0(int *x) WRITE_ONCE(*x, 1); exists (1:r0=1 / 2:r0=1 / 2:r1=0) P1(int *x, int *y) int r0; r0 = READ_ONCE(*x); smp_store_release(y, 1); ``` ## **Execution and propagation (full memory barriers)** ``` P1(int *x, int *y) C store-buffering int r0; int x = 0: int y = 0; WRITE_ONCE(*y, 1); smp_mb(); r0 = READ_ONCE(*x); PO(int *x, int *y) int r0: exists (0:r0=0 /\ 1:r0=0) WRITE ONCE(*x. 1): smp_mb(); r0 = READ_ONCE(*v); ``` This 'exists' clause can NOT be satisfied! ## **Atomicity** This 'exists' clause can NOT be satisfied! #### Mappings to processors | | | x86 | powerpc | arm64 | riscv | |------|------------------------------------------------|---------------------------------|-------------------------------------------|--------------------------------|-------------------------------------------------| | smp. | D_ONCE() _load_acquire() _store_release() mb() | mov<br>mov<br>nov<br>lock; addl | ldw<br>ldw; lwsync<br>lwsync; stw<br>sync | ldr<br>ldar<br>stlr<br>dmb ish | lw; fence r,rw<br>fence rw,w; sw<br>fence rw,rw | | ator | mic_inc() | lock; incl | LL/SC | stadd | amoadd.w | I'm now seeing 1% difference between the runs with 0.3% noise for either of them, [...] $\;\;$ I still think that is significant Will Deacon, on ldr vs. ldar in rcu\_dereference() $18\mbox{-}32\%$ slower, or $23\mbox{-}47$ cycles. [...] So although this test is not a real workload it is a proxy for something people do complain to us about. Michael Ellerman, on lwsync vs. sync in spin\_unlock() # **Concluding remarks** ### 'the minimum functionality...' we can rely on? ``` unsigned long __xchg_u32(volatile u32 *ptr, u32 new) { [...] spin_lock_irqsave(ATOMIC_HASH(ptr), flags); prev = *ptr; *ptr = new; [...] ``` (from arch/sparc/lib/atomic32.c) In fact, a recent bug (since fixed) caused GCC to incorrectly use this optimization in a volatile store. In the absence of such bugs, use of WRITE\_ONCE() prevents store tearing [...] (from Documentation/memory-barriers.txt) ("lightweight sync") The memory barrier provides an ordering function for the storage accesses caused by Load, Store, and dobz instructions [...] in storage that is [...] (from Power ISA Version 3.0B - Sect. 4.6.3, p. 873) On all versions of the Cortex-A9 MPCore processor $[\ldots]$ successive reads from the same location $[\ldots]$ can result in the read values not appearing in program order. (from Read-after-Read Hazards - ARM Ref. 761319) ### 'a guide as to how to use...' memory barriers? - Did you consider locking, RCU, ...? - Write a litmus test, or try to - Cc: these people. . . #### We want to hear from you! Maintainers: Alan Stern, Andrea Parri, Will Deacon, Peter Zijlstra, Boqun Feng, Nicholas Piggin, David Howells, Jade Alglave, Luc Maranget, and Paul E. McKenney Reviewers: Akira Yokosawa and Daniel Lustig Email lists: linux-kernel@vger.kernel.org, linux-arch@vger.kernel.org ``` /* Guarantees that we have nice foo's. */ smp_store_release(&foo->flag, new_flag); ``` ``` WRITE_ONCE(foo->data, new_data); /* A */ [...] /* * Guarantees that we have nice foo's. * * Orders (A) before (B). */ smp_store_release(&foo->flag, new_flag); /* B */ ``` ``` WRITE_ONCE(foo->data, new_data); /* A */ [...] /* * Guarantees that we have nice foo's. * * Orders (A) before (B). Matches the smp_load_acquire() * in consumer() that orders (C) before (D). */ smp_store_release(&foo->flag, new_flag); /* B */ ``` ``` WRITE_ONCE(foo->data, new_data); /* A */ [...] /* * Guarantees that we have nice foo's. * * Orders (A) before (B). Matches the smp_load_acquire() * in consumer() that orders (C) before (D). * * Forbids: (C) reads-from (B) AND (A) overwrites (D). */ smp_store_release(&foo->flag, new_flag); /* B */ ``` ``` WRITE_ONCE(foo->data, new_data); /* A */ [...] /* * Guarantees that we have nice foo's. * * Orders (A) before (B). Matches the smp_load_acquire() * in consumer() that orders (C) before (D). * * Forbids: (C) reads-from (B) AND (A) overwrites (D). */ smp_store_release(&foo->flag, new_flag); /* B */ ``` #### What's next? - SRCU - Data races? - Mixed-size accesses? #### Thanks! Faster crap is still crap. Ingo Molnar Golden rule #12: When the comments do not match the code, they probably are both wrong;) Steven Rostedt