Re: C aggregate passing (Rust kernel policy)

From: Ralf Jung
Date: Tue Mar 04 2025 - 13:47:18 EST

Next message: Dimitri Fedrau via B4 Relay: "[PATCH net-next v2 1/2] net: phy: tja11xx: add support for TJA1102S"
Previous message: Sabrina Dubroca: "Re: [PATCH v21 18/24] ovpn: add support for peer floating"
In reply to: David Laight: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Ralf Jung: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi all,

Whether the compiler is permitted to do that depends heavily on what exactly
the code looks like, so it's hard to discuss this in the abstract.
If inside some function, *all* writes to a given location are atomic (I
think that's what you call WRITE_ONCE?), then the compiler is *not* allowed
to invent any new writes to that memory. The compiler has to assume that
there might be concurrent reads from other threads, whose behavior could
change from the extra compiler-introduced writes. The spec (in C, C++, and
Rust) already works like that.

OTOH, the moment you do a single non-atomic write (i.e., a regular "*ptr =
val;" or memcpy or so), that is a signal to the compiler that there cannot
be any concurrent accesses happening at the moment, and therefore it can
(and likely will) introduce extra writes to that memory.

Is that how it really works?

I'd expect the atomic writes to have what we call "compiler barriers"
before and after; IOW, the compiler can do whatever it wants with non
atomic writes, provided it doesn't cross those barriers.

If you do a non-atomic write, and then an atomic release write, that release write marks communication with another thread. When I said "concurrent accesses [...] at the moment" above, the details of what exactly that means matter a lot: by doing an atomic release write, the "moment" has passed, as now other threads could be observing what happened.

One can get quite far thinking about these things in terms of "barriers" that block the compiler from reordering operations, but that is not actually what happens. The underlying model is based on describing the set of behaviors that a program can have when using particular atomicity orderings (such as release, acquire, relaxed); the compiler is responsible for ensuring that the resulting program only exhibits those behaviors. An approach based on "barriers" is one, but not the only, approach to achieve that: at least in special cases, compilers can and do perform more optimizations. The only thing that matters is that the resulting program still behaves as-if it was executed according to the rules of the language, i.e., the program execution must be captured by the set of behaviors that the atomicity memory model permits. This set of behaviors is, btw, completely portable; this is truly an abstract semantics and not tied to what any particular hardware does.

Now, that's the case for general C++ or Rust. The Linux kernel is special in that its concurrency support predates the official model, so it is written in a different style, commonly referred to as LKMM. I'm not aware of a formal study of that model to the same level of rigor as the C++ model, so for me as a theoretician it is much harder to properly understand what happens there, unfortunately. My understanding is that many LKMM operations can be mapped to equivalent C++ operations (i.e., WRITE_ONCE and READ_ONCE correspond to atomic relaxed loads and stores). However, the LKMM also makes use of dependencies (address and/or data dependencies? I am not sure), and unfortunately those fundamentally clash with even basic compiler optimizations such as GVN/CSE or algebraic simplifications, so it's not at all clear how they can even be used in an optimizing compiler in a formally sound way (i.e., "we could, in principle, mathematically prove that this is correct"). Finding a rigorous way to equip an optimized language such as C, C++, or Rust with concurrency primitives that emit the same efficient assembly code as what the LKMM can produce is, I think, an open problem. Meanwhile, the LKMM seems to work in practice despite those concerns, and that should apply to both C (when compiled with clang) and Rust in the same way -- but when things go wrong, the lack of a rigorous contract will make it harder to determine whether the bug is in the compiler or the kernel. But again, Rust should behave exactly like clang here, so this should not be a new concern. :)

Kind regards,
Ralf

Next message: Dimitri Fedrau via B4 Relay: "[PATCH net-next v2 1/2] net: phy: tja11xx: add support for TJA1102S"
Previous message: Sabrina Dubroca: "Re: [PATCH v21 18/24] ovpn: add support for peer floating"
In reply to: David Laight: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Ralf Jung: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]