Re: [PATCH v2] rust: page: add byte-wise atomic memory copy methods
From: Alice Ryhl
Date: Tue Feb 17 2026 - 05:47:23 EST
On Tue, Feb 17, 2026 at 11:25:57AM +0100, Peter Zijlstra wrote:
> On Tue, Feb 17, 2026 at 10:01:56AM +0000, Alice Ryhl wrote:
> > On Tue, Feb 17, 2026 at 10:45:15AM +0100, Peter Zijlstra wrote:
> > > On Tue, Feb 17, 2026 at 09:33:40AM +0000, Alice Ryhl wrote:
> > > > On Tue, Feb 17, 2026 at 10:13:48AM +0100, Peter Zijlstra wrote:
> > > > > On Fri, Feb 13, 2026 at 08:19:17AM -0800, Boqun Feng wrote:
> > > > > > Well, in standard C, technically memcpy() has the same problem as Rust's
> > > > > > `core::ptr::copy()` and `core::ptr::copy_nonoverlapping()`, i.e. they
> > > > > > are vulnerable to data races. Our in-kernel memcpy() on the other hand
> > > > > > doesn't have this problem. Why? Because it's volatile byte-wise atomic
> > > > > > per the implementation.
> > > > >
> > > > > Look at arch/x86/lib/memcpy_64.S, plenty of movq variants there. Not
> > > > > byte-wise.
> > > >
> > > > movq is a valid implementation of 8 byte-wise copies.
> > > >
> > > > > Also, not a single atomic operation in sight.
> > > >
> > > > Relaxed atomics are just mov ops.
> > >
> > > They are not atomics at all.
> >
> > Atomic loads and stores are just mov ops, right? Sure, RMW operations do
> > more complex stuff, but I'm pretty sure that relaxed atomic loads/stores
> > generally are compiled as mov ops.
>
> Yeah, because they're not in fact atomic. I have, on various occasions,
> told people to not use atomic_t if all they end up doing is atomic_set()
> and atomic_read(). They're just loads and stores, nothing atomic about
> them.
>
> They are just there to complete the interactions with the actual RmW
> operations.
>
> > > Somewhere along the line 'atomic' seems to have lost any and all meaning
> > > :-(
> > >
> > > It must be this C committee and their weasel speak for fear of reality
> > > that has infected everyone or somesuch.
> > >
> > > Anyway, all you really want is a normal memcpy and somehow Rust cannot
> > > provide? WTF?!
> >
> > Forget about Rust for a moment.
> >
> > Consider this code:
> >
> > // Is this ok?
> > unsigned long *a, b;
> > b = *a;
> > if is_valid(b) {
> > // do stuff
> > }
>
> Syntax error on is_valid(), need opening ( after if.
Oops, too much Rust for me :)
> > I can easily imagine that LLVM might optimize this into:
> >
> > // Uh oh!
> > unsigned long *a, b;
> > b = *a;
> > if is_valid(*a) { // <- this was "optimized"
> > // do stuff
> > }
>
> Well, compiler would not do anything, since it wouldn't compile :-) But
> sure, that is valid transform.
>
> > the argument being that you used an ordinary load of `a`, so it can be
> > assumed that there are no concurrent writes, so both reads are
> > guaranteed to return the same value.
> >
> > So if `a` might be concurrently modified, then we are unhappy.
> >
> > Of course, if *a is replaced with an atomic load such as READ_ONCE(a) an
> > optimization would no longer occur.
>
> Stop using atomic for this. Is not atomic.
>
> Key here is volatile, that indicates value can change outside of scope
> and thus re-load is not valid. And I know C language people hates
> volatile, but there it is.
Well, don't complain to me about this. I sent a patch to add READ_ONCE()/
WRITE_ONCE() impls for Rust and was told to just use atomics instead,
see: https://lwn.net/Articles/1053142/
> > // OK!
> > unsigned long *a, b;
> > b = READ_ONCE(a);
> > if is_valid(b) {
> > // do stuff
> > }
> >
> > Now consider the following code:
> >
> > // Is this ok?
> > unsigned long *a, b;
> > memcpy(a, &b, sizeof(unsigned long));
> > if is_valid(b) {
> > // do stuff
> > }
>
> Why the hell would you want to write that? But sure. I think similar but
> less weird example would be with structures, where value copies end up
> being similar to memcpy.
I mean sure, let's say that it was a structure or whatever instead of a
long. The point is that the general pattern of memcpy, then checking the
bytes you copied, then use the bytes you copied, is potentially
susceptible to this exacty optimization.
> And in that case, you can still use volatile and compiler must not do
> silly.
What you mean by "volatile" here is the same as what this patch means
when it says "per-byte atomic". If you agree that a "volatile memcpy"
would be a good idea to use in this scenario, then it sounds like you
agree with the patch except for its naming / terminology.
> > If LLVM understands the memcpy in the same way as how it understands
> >
> > b = *a; // same as memcpy, right?
> >
> > then by above discussion, the memcpy is not enough either. And Rust
> > documents that it may treat copy_nonoverlapping() in exactly that way,
> > which is why we want a memcpy where reading the values more than once is
> > not a permitted optimization. In most discussions of that topic, that's
> > called a per-byte atomic memcpy.
> >
> > Does this optimization happen in the real world? I have no clue. I'd
> > rather not find out.
>
> OK, but none of this has anything to do with atomic or byte-wise.
>
> The whole byte-wise thing turns out to be about not allowing
> out-of-thin-air. Nothing should ever allow that.
It's not just about out-of-thin-air, it's also the kind of optimization
I mentioned.
> Anyway, normal userspace copies don't suffer this because accessing
> userspace has enough magical crap around it to inhibit this optimization
> in any case.
>
> If its a shared mapping/DMA, you'd typically end up with barriers
> anyway, and those have a memory clobber on them which tell the compiler
> reloads aren't good.
>
> So I'm still not exactly sure why this is a problem all of a sudden?
I mean, this is for `struct page` specifically. If you have the struct
page for a page that might also be mapped into a userspace vma, then the
way to perform a "copy_from_user" operation is to:
1. kmap_local_page()
2. memcpy()
3. kunmap_local()
Correct me if I'm wrong, but my understanding is that on 64-bit systems,
kmap/kunmap are usually complete no-ops since you have enough address
space to simply map all pages into the kernel's address space. Not even
a barrier - just a `static inline` with an empty body.
Alice