Re: [PATCH v2] rust: page: add byte-wise atomic memory copy methods

From: Alice Ryhl

Date: Tue Feb 17 2026 - 05:03:08 EST

On Tue, Feb 17, 2026 at 10:45:15AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 17, 2026 at 09:33:40AM +0000, Alice Ryhl wrote:
> > On Tue, Feb 17, 2026 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > > On Fri, Feb 13, 2026 at 08:19:17AM -0800, Boqun Feng wrote:
> > > > Well, in standard C, technically memcpy() has the same problem as Rust's
> > > > `core::ptr::copy()` and `core::ptr::copy_nonoverlapping()`, i.e. they
> > > > are vulnerable to data races. Our in-kernel memcpy() on the other hand
> > > > doesn't have this problem. Why? Because it's volatile byte-wise atomic
> > > > per the implementation.
> > >
> > > Look at arch/x86/lib/memcpy_64.S, plenty of movq variants there. Not
> > > byte-wise.
> >
> > movq is a valid implementation of 8 byte-wise copies.
> >
> > > Also, not a single atomic operation in sight.
> >
> > Relaxed atomics are just mov ops.
>
> They are not atomics at all.

Atomic loads and stores are just mov ops, right? Sure, RMW operations do
more complex stuff, but I'm pretty sure that relaxed atomic loads/stores
generally are compiled as mov ops.

> Somewhere along the line 'atomic' seems to have lost any and all meaning
> :-(
>
> It must be this C committee and their weasel speak for fear of reality
> that has infected everyone or somesuch.
>
> Anyway, all you really want is a normal memcpy and somehow Rust cannot
> provide? WTF?!

Forget about Rust for a moment.

Consider this code:

// Is this ok?
unsigned long *a, b;
b = *a;
if is_valid(b) {
// do stuff
}

I can easily imagine that LLVM might optimize this into:

// Uh oh!
unsigned long *a, b;
b = *a;
if is_valid(*a) { // <- this was "optimized"
// do stuff
}

the argument being that you used an ordinary load of `a`, so it can be
assumed that there are no concurrent writes, so both reads are
guaranteed to return the same value.

So if `a` might be concurrently modified, then we are unhappy.

Of course, if *a is replaced with an atomic load such as READ_ONCE(a) an
optimization would no longer occur.

// OK!
unsigned long *a, b;
b = READ_ONCE(a);
if is_valid(b) {
// do stuff
}

Now consider the following code:

// Is this ok?
unsigned long *a, b;
memcpy(a, &b, sizeof(unsigned long));
if is_valid(b) {
// do stuff
}

If LLVM understands the memcpy in the same way as how it understands

b = *a; // same as memcpy, right?

then by above discussion, the memcpy is not enough either. And Rust
documents that it may treat copy_nonoverlapping() in exactly that way,
which is why we want a memcpy where reading the values more than once is
not a permitted optimization. In most discussions of that topic, that's
called a per-byte atomic memcpy.

Does this optimization happen in the real world? I have no clue. I'd
rather not find out.

Alice