Re: [PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation
From: Boqun Feng
Date: Sun Oct 18 2015 - 21:18:23 EST
On Fri, Oct 09, 2015 at 10:40:39AM +0100, Will Deacon wrote:
> On Fri, Oct 09, 2015 at 10:31:38AM +0200, Peter Zijlstra wrote:
[snip]
> >
> > So lots of little confusions added up to complete fail :-{
> >
> > Mostly I think it was the UNLOCK x + LOCK x are fully ordered (where I
> > forgot: but not against uninvolved CPUs) and RELEASE/ACQUIRE are
> > transitive (where I forgot: RELEASE/ACQUIRE _chains_ are transitive, but
> > again not against uninvolved CPUs).
> >
> > Which leads me to think I would like to suggest alternative rules for
> > RELEASE/ACQUIRE (to replace those Will suggested; as I think those are
> > partly responsible for my confusion).
>
> Yeah, sorry. I originally used the phrase "fully ordered" but changed it
> to "full barrier", which has stronger transitivity (newly understood
> definition) requirements that I didn't intend.
>
> RELEASE -> ACQUIRE should be used for message passing between two CPUs
> and not have ordering effects on other observers unless they're part of
> the RELEASE -> ACQUIRE chain.
>
> > - RELEASE -> ACQUIRE is fully ordered (but not a full barrier) when
> > they operate on the same variable and the ACQUIRE reads from the
> > RELEASE. Notable, RELEASE/ACQUIRE are RCpc and lack transitivity.
>
> Are we explicit about the difference between "fully ordered" and "full
> barrier" somewhere else, because this looks like it will confuse people.
>
This is confusing me right now. ;-)
Let's use a simple example for only one primitive, as I understand it,
if we say a primitive A is "fully ordered", we actually mean:
1. The memory operations preceding(in program order) A can't be
reordered after the memory operations following(in PO) A.
and
2. The memory operation(s) in A can't be reordered before the
memory operations preceding(in PO) A and after the memory
operations following(in PO) A.
If we say A is a "full barrier", we actually means:
1. The memory operations preceding(in program order) A can't be
reordered after the memory operations following(in PO) A.
and
2. The memory ordering guarantee in #1 is visible globally.
Is that correct? Or "full barrier" is more strong than I understand,
i.e. there is a third property of "full barrier":
3. The memory operation(s) in A can't be reordered before the
memory operations preceding(in PO) A and after the memory
operations following(in PO) A.
IOW, is "full barrier" a more strong version of "fully ordered" or not?
Regards,
Boqun
> > - RELEASE -> ACQUIRE can be upgraded to a full barrier (including
> > transitivity) using smp_mb__release_acquire(), either before RELEASE
> > or after ACQUIRE (but consistently [*]).
>
> Hmm, but we don't actually need this for RELEASE -> ACQUIRE, afaict. This
> is just needed for UNLOCK -> LOCK, and is exactly what RCU is currently
> using (for PPC only).
>
> Stepping back a second, I believe that there are three cases:
>
>
> RELEASE X -> ACQUIRE Y (same CPU)
> * Needs a barrier on TSO architectures for full ordering
>
> UNLOCK X -> LOCK Y (same CPU)
> * Needs a barrier on PPC for full ordering
>
> RELEASE X -> ACQUIRE X (different CPUs)
> UNLOCK X -> ACQUIRE X (different CPUs)
> * Fully ordered everywhere...
> * ... but needs a barrier on PPC to become a full barrier
>
>
Attachment:
signature.asc
Description: PGP signature