Re: C aggregate passing (Rust kernel policy)

From: Kent Overstreet
Date: Fri Feb 28 2025 - 11:22:14 EST

Next message: Alexey Klimov: "[PATCH] arm64: dts: qcom: qrb5165-rb5: add compressed playback support"
Previous message: Borislav Petkov: "Re: [PATCH v14 02/13] x86/mm: get INVLPGB count max from CPUID"
In reply to: Boqun Feng: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Boqun Feng: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Feb 28, 2025 at 08:13:09AM -0800, Boqun Feng wrote:
> On Fri, Feb 28, 2025 at 11:04:28AM -0500, Kent Overstreet wrote:
> > On Fri, Feb 28, 2025 at 07:46:23AM -0800, Boqun Feng wrote:
> > > On Fri, Feb 28, 2025 at 10:41:12AM -0500, Kent Overstreet wrote:
> > > > On Fri, Feb 28, 2025 at 08:44:58AM +0100, Ralf Jung wrote:
> > > > > Hi,
> > > > >
> > > > > > > I guess you can sum this up to:
> > > > > > >
> > > > > > > The compiler should never assume it's safe to read a global more than the
> > > > > > > code specifies, but if the code reads a global more than once, it's fine
> > > > > > > to cache the multiple reads.
> > > > > > >
> > > > > > > Same for writes, but I find WRITE_ONCE() used less often than READ_ONCE().
> > > > > > > And when I do use it, it is more to prevent write tearing as you mentioned.
> > > > > >
> > > > > > Except that (IIRC) it is actually valid for the compiler to write something
> > > > > > entirely unrelated to a memory location before writing the expected value.
> > > > > > (eg use it instead of stack for a register spill+reload.)
> > > > > > Not gcc doesn't do that - but the standard lets it do it.
> > > > >
> > > > > Whether the compiler is permitted to do that depends heavily on what exactly
> > > > > the code looks like, so it's hard to discuss this in the abstract.
> > > > > If inside some function, *all* writes to a given location are atomic (I
> > > > > think that's what you call WRITE_ONCE?), then the compiler is *not* allowed
> > > > > to invent any new writes to that memory. The compiler has to assume that
> > > > > there might be concurrent reads from other threads, whose behavior could
> > > > > change from the extra compiler-introduced writes. The spec (in C, C++, and
> > > > > Rust) already works like that.
> > > > >
> > > > > OTOH, the moment you do a single non-atomic write (i.e., a regular "*ptr =
> > > > > val;" or memcpy or so), that is a signal to the compiler that there cannot
> > > > > be any concurrent accesses happening at the moment, and therefore it can
> > > > > (and likely will) introduce extra writes to that memory.
> > > >
> > > > Is that how it really works?
> > > >
> > > > I'd expect the atomic writes to have what we call "compiler barriers"
> > > > before and after; IOW, the compiler can do whatever it wants with non
> > >
> > > If the atomic writes are relaxed, they shouldn't have "compiler
> > > barriers" before or after, e.g. our kernel atomics don't have such
> > > compiler barriers. And WRITE_ONCE() is basically relaxed atomic writes.
> >
> > Then perhaps we need a better definition of ATOMIC_RELAXED?
> >
> > I've always taken ATOMIC_RELAXED to mean "may be reordered with accesses
> > to other memory locations". What you're describing seems likely to cause
>
> You lost me on this one. if RELAXED means "reordering are allowed", then
> why the compiler barriers implied from it?

yes, compiler barrier is the wrong language here

> > e.g. if you allocate a struct, memset() it to zero it out, then publish
> > it, then do a WRITE_ONCE()...
>
> How do you publish it? If you mean:
>
> // assume gp == NULL initially.
>
> *x = 0;
> smp_store_release(gp, x);
>
> WRITE_ONCE(*x, 1);
>
> and the other thread does
>
> x = smp_load_acquire(gp);
> if (p) {
> r1 = READ_ONCE(*x);
> }
>
> r1 can be either 0 or 1.

So if the compiler does obey the store_release barrier, then we're ok.

IOW, that has to override the "compiler sees the non-atomic store as a
hint..." - but the thing is, since we're moving more to type system
described concurrency than helpers, I wonder if that will actually be
the case.

Also, what's the situation with reads? Can we end up in a situation
where a non-atomic read causes the compiler do erronious things with an
atomic_load(..., relaxed)?

Next message: Alexey Klimov: "[PATCH] arm64: dts: qcom: qrb5165-rb5: add compressed playback support"
Previous message: Borislav Petkov: "Re: [PATCH v14 02/13] x86/mm: get INVLPGB count max from CPUID"
In reply to: Boqun Feng: "Re: C aggregate passing (Rust kernel policy)"
Next in thread: Boqun Feng: "Re: C aggregate passing (Rust kernel policy)"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]