Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()

From: Paul E. McKenney
Date: Mon Nov 02 2015 - 20:58:12 EST

On Mon, Nov 02, 2015 at 11:17:17AM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 10:37 AM, Will Deacon <will.deacon@xxxxxxx> wrote:
> > On Mon, Nov 02, 2015 at 10:08:24AM -0800, Linus Torvalds wrote:
> >> On Mon, Nov 2, 2015 at 5:29 AM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >> > +#define smp_cond_acquire(cond) do { \
> >> > + while (!(cond)) \
> >> > + cpu_relax(); \
> >> > + smp_read_barrier_depends(); /* ctrl */ \
> >> > + smp_rmb(); /* ctrl + rmb := acquire */ \
> >> > +} while (0)
> >>
> >> This code makes absolutely no sense.
> >>
> >> smp_read_barrier_depends() is about a memory barrier where there is a
> >> data dependency between two accesses. The "depends" is very much about
> >> the data dependency, and very much about *nothing* else.
> >
> > Paul wasn't so sure, which I think is why smp_read_barrier_depends()
> > is already used in, for example, READ_ONCE_CTRL:
> >
> >
> Quoting the alpha architecture manual is kind of pointless, when NO
> OTHER ARCHITECTURE OUT THERE guararantees that whole "read +
> conditional orders wrt subsequent writes" _either_.
> (Again, with the exception of x86, which has the sane "we honor causality")
> Alpha isn't special. And smp_read_barrier_depends() hasn't magically
> become something new.

The existing control dependencies (READ_ONCE_CTRL() and friends) only
guarantee ordering against later stores, and not against later loads.
Of the weakly ordered architectures, only Alpha fails to respect
load-to-store control dependencies.

> If people think that control dependency needs a memory barrier on
> alpha, then it damn well needs on on all other weakly ordered
> architectuers too, afaik.

For load-to-load control dependencies, agreed, from what I can see,
all the weakly ordered architectures do need an explicit barrier.

> Either that "you cannot finalize a write unless all conditionals it
> depends on are finalized" is true or it is not. That argument has
> *never* been about some architecture-specific memory ordering model,
> as far as I know.
> As to READ_ONCE_CTRL - two wrongs don't make a right.
> That smp_read_barrier_depends() there doesn't make any sense either.
> And finally, the alpha architecture manual actually does have the
> notion of "Dependence Constraint" ( that talks about writes
> that depend on previous reads (where "depends" is explicitly spelled
> out to be about conditionals, write data _or_ write address). They are
> actually constrained on alpha too.

I am in India and my Alpha Architecture Manual is in the USA. Google
Books has a PDF, but it conveniently omits that particular section.
I do remember quoting that section at the Alpha architects back in the
late 1990s and being told that it didn't mean what I thought it meant.

And they did post a clarification on the web:

For instance, your producer must issue a "memory barrier"
instruction after writing the data to shared memory and before
inserting it on the queue; likewise, your consumer must issue a
memory barrier instruction after removing an item from the queue
and before reading from its memory. Otherwise, you risk seeing
stale data, since, while the Alpha processor does provide coherent
memory, it does not provide implicit ordering of reads and writes.
(That is, the write of the producer's data might reach memory
after the write of the queue, such that the consumer might read
the new item from the queue but get the previous values from
the item's memory.

So apparently does not sanction data dependency ordering.

> Note that a "Dependence Constraint" is not a memory barrier, because
> it only affects that particular chain of dependencies. So it doesn't
> order other things in *general*, but it does order a particular read
> with a particular sef of subsequent write. Which is all we guarantee
> on anything else too wrt the whole control dependencies.
> The smp_read_barrier_depends() is a *READ BARRIER*. It's about a
> *read* that depends on a previous read. Now, it so happens that alpha
> doesn't have a "read-to-read" barrier, and it's implemented as a full
> barrier, but it really should be read as a "smp_rmb().
> Yes, yes, alpha has the worst memory ordering ever, and the worst
> barriers for that insane memory ordering, but that still does not make
> alpha "magically able to reorder more than physically or logically
> possible". We don't add barriers that don't make sense just because
> the alpha memory orderings are insane.
> Paul? I really disagree with how you have totally tried to re-purpose
> smp_read_barrier_depends() in ways that are insane and make no sense.

I did consider adding a new name, but apparently was lazy that day.
I would of course be happy to add a separate name.

> That is not a control dependency. If it was, then PPC and ARM would
> need to make it a real barrier too. I really don't see why you have
> singled out alpha as the victim of your "let's just randomly change
> the rules".

Just trying to make this all work on Alpha as well as the other
architectures... But if the actual Alpha hardware that is currently
running recent Linux kernels is more strict than the architecture, I of
course agree that we could code to the actual hardware rather than to
the architecture. They aren't making new Alphas, so we could argue
thta the usual forward-compatibility issues do not apply.

And I have not been able to come up with an actual scenario that
allows real Alpha hardware to reorder a store with a prior load that it
control-depends on. Then again, I wasn't able to come up with the
split-cache scenario that breaks data dependencies before my lengthy
argument^Wdiscussion with the Alpha architects.

> > In this case, control dependencies are only referring to READ -> WRITE
> > ordering, so they are honoured by ARM and PowerPC.
> Do ARM and PPC actually guarantee the generic "previous reads always
> order before subsequent writes"?

Almost. PPC instead guarantees ordering between a previous read and a
later store, but only if they are separated (at runtime) by a conditional
branch that depends on the read.

> Because if that's the issue, then we should perhaps add a new ordering
> primitive that says that (ie "smp_read_to_write_barrier()"). That
> could be a useful thing in general, where we currently use "smp_mb()"
> to order an earlier read with a later write.

I agree that smp_read_to_write_barrier() would be useful, and on PPC
it could use the lighter-weight lwsync instruction. But that is a
different case than the load-to-store control dependencies, which PPC
(and I believe also ARM) enforce without a memory-barrier instruction.

Thanx, Paul

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at