On Mon, Jan 30, 2023 at 01:23:28PM +0100, Jonas Oberhauser wrote:
True, but the same problem for our asm implemented atomics, right? My
On 1/27/2023 11:09 PM, Boqun Feng wrote:
On Fri, Jan 27, 2023 at 03:34:33PM +0100, Peter Zijlstra wrote:Another is that the C11 model is more about atomic locations than atomic
Hijack this thread a little bit, but while we are at it, do you think itI also noticed that GCC has some builtin/extension to do such things,On a per-architecture basis only, the C/C++ memory model does not match
__atomic_OP_fetch and __atomic_fetch_OP, but I do not know if this
can be used in the kernel.
the Linux Kernel memory model so using the compiler to generate the
atomic ops is somewhat tricky and needs architecture audits.
makes sense that we have a config option that allows archs to
implement LKMM atomics via C11 (volatile) atomics? I know there are gaps
between two memory models, but the option is only for fallback/generic
implementation so we can put extra barriers/orderings to make things
guaranteed to work.
accesses, and there are several places in the kernel where a location is
accessed both atomically and non-atomically. This API mismatch is more
severe than the semantic differences in my opinion, since you don't have
guarantees of what the layout of atomics is going to be.
plan is to do (volatile atomic_int *) casts on these locations.
Perhaps you could instead rely on the compiler builtins? Note that this mayThese are less formal/defined to me, and not much research on them I
assume, I'd rather not use them.
invalidate some progress properties, e.g., ticket locks become unfair if theFor Rust case, cross-language LTO is needed I think, and last time I
increment (for taking a ticket) is implemented with a CAS loop (because a
thread can fail forever to get a ticket if the ticket counter is contended,
and thus starve). There may be some linux atomics that don't map to any
compiler builtins and need to implemented with such CAS loops, potentially
leading to such problems.
I'm also curious whether link time optimization can resolve the inlining
issue?
tried, it didn't work.
I think another big question for me is to which extent it makes senseWhat do you mean by "bad" ;-) ;-) ;-)
anyways to have shared memory concurrency between the Rust code and the C
code. It seems all the bad concurrency stuff from the C world would flow
into the Rust world, right?
If you can live without shared Rust & C concurrency, then perhaps you canFirst I don't think I can avoid using LKMM in Rust, besides the
get away without using LKMM in Rust at all, and just rely on its (C11-like)
memory model internally and talk to the C code through synchronous, safer
ways.
communication from two sides, what if kernel developers just want to
use the memory model they learn and understand (i.e. LKMM) in a new Rust
driver?
They probably already have a working parallel algorithm based on
LKMM.
Further, let's say we make C and Rust talk without shared memory
concurrency, what would that be? Will it more defined/formal the LKMM?
How's the cost if we use synchronous ways? I personally think there are
places in core kernel where Rust can be tried, whatever the mechanism is
used, it cannot sarcrifed.
I'm not against having a fallback builtin-based implementation of LKMM, andFun fact, there exist some "optimizations" that don't generate the asm
I don't think that it really needs architecture audits. What it needs is
code as you want:
https://github.com/llvm/llvm-project/issues/56450
Needless to say, they are bugs, and will be fixed, besides making atomic
volatile seems to avoid these "optimizations"
some additional compiler barriers and memory barriers, to ensure that theAgreed. And this is another reason I want to do it: I'm curious about
arguments about dependencies and non-atomics still hold. E.g., a release
store may not just be "builtin release store" but may need to have a
compiler barrier to prevent the release store being moved in program order.
And a "full barrier" exchange may need an mb() infront of the operation to
avoid "roach motel ordering" (i.e., x=1 ; "full barrier exchange"; y = 1
allows y=1 to execute before x=1 in the compiler builtins as far as I
remember). And there may be some other cases like this.
how far C11 memory model and LKMM are different, and whether there is a
way to implement one by another, what are the gaps (theorical and
pratical), whether the ordering we have in LKMM can be implemented by
compilers (mostly dependencies).
More importantly, we could improve both
to get something better? With the ability to exactly express the
programmers' intention yet still allow optimization by the compilers.
But I currently don't see that this implementation would be noticeablyYou are not wrong, surely we will need to real benchmark to know. But my
faster than paying the overhead of lack of inline.
rationale is 1) in theory this is faster, 2) we also get a chance to try
out code based on LKMM with C11 atomics to see where it hurts. Therefore
I asked ;-)