Re: [PATCH v2] rust: page: add byte-wise atomic memory copy methods

From: Peter Zijlstra

Date: Tue Feb 17 2026 - 05:28:24 EST

On Tue, Feb 17, 2026 at 10:01:56AM +0000, Alice Ryhl wrote:
> On Tue, Feb 17, 2026 at 10:45:15AM +0100, Peter Zijlstra wrote:
> > On Tue, Feb 17, 2026 at 09:33:40AM +0000, Alice Ryhl wrote:
> > > On Tue, Feb 17, 2026 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > > > On Fri, Feb 13, 2026 at 08:19:17AM -0800, Boqun Feng wrote:
> > > > > Well, in standard C, technically memcpy() has the same problem as Rust's
> > > > > `core::ptr::copy()` and `core::ptr::copy_nonoverlapping()`, i.e. they
> > > > > are vulnerable to data races. Our in-kernel memcpy() on the other hand
> > > > > doesn't have this problem. Why? Because it's volatile byte-wise atomic
> > > > > per the implementation.
> > > >
> > > > Look at arch/x86/lib/memcpy_64.S, plenty of movq variants there. Not
> > > > byte-wise.
> > >
> > > movq is a valid implementation of 8 byte-wise copies.
> > >
> > > > Also, not a single atomic operation in sight.
> > >
> > > Relaxed atomics are just mov ops.
> >
> > They are not atomics at all.
>
> Atomic loads and stores are just mov ops, right? Sure, RMW operations do
> more complex stuff, but I'm pretty sure that relaxed atomic loads/stores
> generally are compiled as mov ops.

Yeah, because they're not in fact atomic. I have, on various occasions,
told people to not use atomic_t if all they end up doing is atomic_set()
and atomic_read(). They're just loads and stores, nothing atomic about
them.

They are just there to complete the interactions with the actual RmW
operations.

> > Somewhere along the line 'atomic' seems to have lost any and all meaning
> > :-(
> >
> > It must be this C committee and their weasel speak for fear of reality
> > that has infected everyone or somesuch.
> >
> > Anyway, all you really want is a normal memcpy and somehow Rust cannot
> > provide? WTF?!
>
> Forget about Rust for a moment.
>
> Consider this code:
>
> // Is this ok?
> unsigned long *a, b;
> b = *a;
> if is_valid(b) {
> // do stuff
> }

Syntax error on is_valid(), need opening ( after if.

> I can easily imagine that LLVM might optimize this into:
>
> // Uh oh!
> unsigned long *a, b;
> b = *a;
> if is_valid(*a) { // <- this was "optimized"
> // do stuff
> }

Well, compiler would not do anything, since it wouldn't compile :-) But
sure, that is valid transform.

> the argument being that you used an ordinary load of `a`, so it can be
> assumed that there are no concurrent writes, so both reads are
> guaranteed to return the same value.
>
> So if `a` might be concurrently modified, then we are unhappy.
>
> Of course, if *a is replaced with an atomic load such as READ_ONCE(a) an
> optimization would no longer occur.

Stop using atomic for this. Is not atomic.

Key here is volatile, that indicates value can change outside of scope
and thus re-load is not valid. And I know C language people hates
volatile, but there it is.

> // OK!
> unsigned long *a, b;
> b = READ_ONCE(a);
> if is_valid(b) {
> // do stuff
> }
>
> Now consider the following code:
>
> // Is this ok?
> unsigned long *a, b;
> memcpy(a, &b, sizeof(unsigned long));
> if is_valid(b) {
> // do stuff
> }

Why the hell would you want to write that? But sure. I think similar but
less weird example would be with structures, where value copies end up
being similar to memcpy.

And in that case, you can still use volatile and compiler must not do
silly.

> If LLVM understands the memcpy in the same way as how it understands
>
> b = *a; // same as memcpy, right?
>
> then by above discussion, the memcpy is not enough either. And Rust
> documents that it may treat copy_nonoverlapping() in exactly that way,
> which is why we want a memcpy where reading the values more than once is
> not a permitted optimization. In most discussions of that topic, that's
> called a per-byte atomic memcpy.
>
> Does this optimization happen in the real world? I have no clue. I'd
> rather not find out.

OK, but none of this has anything to do with atomic or byte-wise.

The whole byte-wise thing turns out to be about not allowing
out-of-thin-air. Nothing should ever allow that.

Anyway, normal userspace copies don't suffer this because accessing
userspace has enough magical crap around it to inhibit this optimization
in any case.

If its a shared mapping/DMA, you'd typically end up with barriers
anyway, and those have a memory clobber on them which tell the compiler
reloads aren't good.

So I'm still not exactly sure why this is a problem all of a sudden?