Re: [PATCH 2/3] rust: sync: generic memory barriers

From: Gary Guo

Date: Thu Apr 02 2026 - 20:08:06 EST


On Thu Apr 2, 2026 at 10:49 PM BST, Joel Fernandes wrote:
> Hi Gary,
>
> On 4/2/2026 11:24 AM, Gary Guo wrote:
>> From: Gary Guo <gary@xxxxxxxxxxx>
>>
>> Implement a generic interface for memory barriers (full system/DMA/SMP).
>> The interface uses a parameter to force user to specify their intent with
>> barriers.
>>
>> It provides `Read`, `Write`, `Full` orderings which map to the existing
>> `rmb()`, `wmb()` and `mb()`, but also `Acquire` and `Release` which is
>> documented to have `LOAD->{LOAD,STORE}` ordering and `{LOAD,STORE}->WRITE`
>> ordering, although for now they're still mapped to a full `mb()`. But in
>> the future it could be mapped to a more efficient form depending on the
>> architecture. I included them as many users do not need the STORE->LOAD
>> ordering, and having them use `Acquire`/`Release` is more clear on their
>> intent in what reordering is to be prevented.
>>
>> Generic is used here instead of providing individual standalone functions
>> to reduce code duplication. For example, the `Acquire` -> `Full` upgrade
>> here is uniformly implemented for all three types. The `CONFIG_SMP` check
>> in `smp_mb` is uniformly implemented for all SMP barriers. This could
>> extend to `virt_mb`'s if they're introduced in the future.
>>
>> Signed-off-by: Gary Guo <gary@xxxxxxxxxxx>
>> ---
>> rust/kernel/sync/atomic/ordering.rs | 2 +-
>> rust/kernel/sync/barrier.rs | 194 ++++++++++++++++++++++++----
>
> IMO this patch should be split up into different patches for CPU vs IO, and
> perhaps even more patches separating out different barrier types.

Given the different barrier types are quite closely related, I don't want to
create a bunch of small patches adding one each. But splitting out the SMP
change and then add DMA/mandatory barrier as another patch does sound
reasonable.

>
>> 2 files changed, 168 insertions(+), 28 deletions(-)
>>
>> diff --git a/rust/kernel/sync/atomic/ordering.rs b/rust/kernel/sync/atomic/ordering.rs
>> index 3f103aa8db99..c4e732e7212f 100644
>> --- a/rust/kernel/sync/atomic/ordering.rs
>> +++ b/rust/kernel/sync/atomic/ordering.rs
> [...]> +// Currently kernel only support `rmb`, `wmb` and full `mb`.
>> +impl MemoryBarrier<Smp> for Read {
>> + #[inline]
>> + fn run() {
>> + // SAFETY: `smp_rmb()` is safe to call.
>> + unsafe { bindings::smp_rmb() };
>> + }
>> +}
>> +
>> +impl MemoryBarrier<Smp> for Write {
>> + #[inline]
>> + fn run() {
>> // SAFETY: `smp_wmb()` is safe to call.
>> unsafe { bindings::smp_wmb() };
>> - } else {
>> - barrier();
>> }
>> }
>>
>> -/// A read-read memory barrier.
>> +impl MemoryBarrier<Smp> for Full {
>> + #[inline]
>> + fn run() {
>> + // SAFETY: `smp_mb()` is safe to call.
>> + unsafe { bindings::smp_mb() };
>> + }
>> +}
>> +
>> +/// Memory barrier.
>> ///
>> -/// A barrier that prevents compiler and CPU from reordering memory read accesses across the
>> -/// barrier.
>> -#[inline(always)]
>> -pub fn smp_rmb() {
>> +/// A barrier that prevents compiler and CPU from reordering memory accesses across the barrier.
>> +///
>> +/// The specific forms of reordering can be specified using the parameter.
>> +/// - `mb(Read)` provides a read-read barrier.
>> +/// - `mb(Write)` provides a write-write barrier.
>> +/// - `mb(Full)` provides a full barrier.
>> +/// - `mb(Acquire)` prevents preceding read from being ordered against succeeding memory
>> +/// operations.
>> +/// - `mb(Release)` prevents preceding memory operations from being ordered against succeeding
>> +/// writes.
>
> I don't agree with this definition of Release. Release is always associated with
> a specific store, likewise acquire with a load. The definition above also
> doesn't make sense 'prevents preceding memory operations from being ordered
> against succeeding writes', that's not what Release semantics are. Release
> orders memory operations with a specific memory operation associated with
> Release. Same for Acquire.

I worded my commit message to say that these are about *intentions* and not
semantics. I do want to change the semantics too, but it's not fully ready yet.
But looks like you're interested, so let me share them too (perhaps a bit
prematurely).

>
> See also in Documentation/memory-barriers.txt, ACQUIRE and RELEASE are defined as being
> tied to specific memory operations.

That's what we have today, hence the implementation upgrades them to full memory
barriers. But ACQUIRE and RELEASE orderings doesn't *need* to be tied to
specific memory operations and they can still make conceptual sense as barriers.

C11 memory model defines Acquire and Release fences, and it looks to me it's
relatively easy to add it to LKMM. I was playing with Herd7 and I think I've got
it working, see the attached diff.

Another thing that I'd like to note is that in all architectures that we have
today except ARM and PARISC, the smp_load_acquire and smp_store_release are
actually implemented as READ_ONCE + ACQUIRE barrier and RELEASE barrier +
WRITE_ONCE. I'm planning to propose C API and corresponding memory model change
too, but I want to gather some more concrete numbers (the performance benefit of
having dma_mb_acquire/dma_mb_release compared to full dma_mb) before proposing so.

Note that marked dma_load_acquire/dma_store_release (and their mandatory
versions) don't make too much sense, as AFAIK no architectures have instructions
for them so you're implementing these as fence instructions anyway.

Best,
Gary

diff --git a/tools/memory-model/linux-kernel.bell b/tools/memory-model/linux-kernel.bell
index fe65998002b9..9b3322fc5b9c 100644
--- a/tools/memory-model/linux-kernel.bell
+++ b/tools/memory-model/linux-kernel.bell
@@ -25,6 +25,8 @@ instructions RMW[Accesses]
enum Barriers = 'wmb (*smp_wmb*) ||
'rmb (*smp_rmb*) ||
'MB (*smp_mb*) ||
+ 'mb-acquire (*smp_mb_acquire*) ||
+ 'mb-release (*smp_mb_release*) ||
'barrier (*barrier*) ||
'rcu-lock (*rcu_read_lock*) ||
'rcu-unlock (*rcu_read_unlock*) ||
diff --git a/tools/memory-model/linux-kernel.cat b/tools/memory-model/linux-kernel.cat
index d7e7bf13c831..d7be7c03bb25 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -33,6 +33,8 @@ let po-unlock-lock-po = po ; [UL] ; (po|rf) ; [LKR] ; po
let R4rmb = R \ Noreturn (* Reads for which rmb works *)
let rmb = [R4rmb] ; fencerel(Rmb) ; [R4rmb]
let wmb = [W] ; fencerel(Wmb) ; [W]
+let mb-acquire = [R4rmb] ; fencerel(Mb-acquire) ; [M]
+let mb-release = [M] ; fencerel(Mb-release) ; [W]
let mb = ([M] ; fencerel(Mb) ; [M]) |
(*
* full-barrier RMWs (successful cmpxchg(), xchg(), etc.) act as
@@ -64,9 +66,10 @@ let mb = ([M] ; fencerel(Mb) ; [M]) |
let gp = po ; [Sync-rcu | Sync-srcu] ; po?
let strong-fence = mb | gp

-let nonrw-fence = strong-fence | po-rel | acq-po
+let nonrw-fence = strong-fence | po-rel | acq-po | mb-acquire | mb-release
let fence = nonrw-fence | wmb | rmb
-let barrier = fencerel(Barrier | Rmb | Wmb | Mb | Sync-rcu | Sync-srcu |
+let barrier = fencerel(Barrier | Rmb | Wmb | Mb | Mb-acquire | Mb-release |
+ Sync-rcu | Sync-srcu |
Before-atomic | After-atomic | Acquire | Release |
Rcu-lock | Rcu-unlock | Srcu-lock | Srcu-unlock) |
(po ; [Release]) | ([Acquire] ; po)
@@ -97,7 +100,7 @@ let ppo = to-r | to-w | (fence & int) | (po-unlock-lock-po & int)
(* Propagation: Ordering from release operations and strong fences. *)
let A-cumul(r) = (rfe ; [Marked])? ; r
let rmw-sequence = (rf ; rmw)*
-let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel) | wmb |
+let cumul-fence = [Marked] ; (A-cumul(strong-fence | po-rel | mb-release) | wmb |
po-unlock-lock-po) ; [Marked] ; rmw-sequence
let prop = [Marked] ; (overwrite & ext)? ; cumul-fence* ;
[Marked] ; rfe? ; [Marked]
diff --git a/tools/memory-model/linux-kernel.def b/tools/memory-model/linux-kernel.def
index 49e402782e49..e32aea2c01a9 100644
--- a/tools/memory-model/linux-kernel.def
+++ b/tools/memory-model/linux-kernel.def
@@ -20,6 +20,8 @@ smp_store_mb(X,V) { __store{ONCE}(X,V); __fence{MB}; }
smp_mb() { __fence{MB}; }
smp_rmb() { __fence{rmb}; }
smp_wmb() { __fence{wmb}; }
+smp_mb_acquire() { __fence{mb-acquire}; }
+smp_mb_release() { __fence{mb-release}; }
smp_mb__before_atomic() { __fence{before-atomic}; }
smp_mb__after_atomic() { __fence{after-atomic}; }
smp_mb__after_spinlock() { __fence{after-spinlock}; }
diff --git a/tools/memory-model/litmus-tests/MP+fencembreleaseonceonce+fencembacquireonceonce.litmus b/tools/memory-model/litmus-tests/MP+fencembreleaseonceonce+fencembacquireonceonce.litmus
new file mode 100644
index 000000000000..d53b848c5687
--- /dev/null
+++ b/tools/memory-model/litmus-tests/MP+fencembreleaseonceonce+fencembacquireonceonce.litmus
@@ -0,0 +1,30 @@
+C MP+fencembrelonceonce+fencembacquireonceonce
+
+(*
+ * Result: Never
+ *
+ * This litmus test demonstrates that smp_mb_release() and
+ * smp_mb_acquire() provide sufficient ordering for the message-passing
+ * pattern.
+ *)
+
+{}
+
+P0(int *buf, int *flag) // Producer
+{
+ WRITE_ONCE(*buf, 1);
+ smp_mb_release();
+ WRITE_ONCE(*flag, 1);
+}
+
+P1(int *buf, int *flag) // Consumer
+{
+ int r0;
+ int r1;
+
+ r0 = READ_ONCE(*flag);
+ smp_mb_acquire();
+ r1 = READ_ONCE(*buf);
+}
+
+exists (1:r0=1 /\ 1:r1=0) (* Bad outcome. *)