Re: [RFC][PATCH 0/5] arch: atomic rework
From: Paul E. McKenney
Date: Fri Feb 07 2014 - 13:02:31 EST
On Fri, Feb 07, 2014 at 04:55:48PM +0000, Will Deacon wrote:
> Hi Paul,
>
> On Fri, Feb 07, 2014 at 04:50:28PM +0000, Paul E. McKenney wrote:
> > On Fri, Feb 07, 2014 at 08:44:05AM +0100, Peter Zijlstra wrote:
> > > On Thu, Feb 06, 2014 at 08:20:51PM -0800, Paul E. McKenney wrote:
> > > > Hopefully some discussion of out-of-thin-air values as well.
> > >
> > > Yes, absolutely shoot store speculation in the head already. Then drive
> > > a wooden stake through its hart.
> > >
> > > C11/C++11 should not be allowed to claim itself a memory model until that
> > > is sorted.
> >
> > There actually is a proposal being put forward, but it might not make ARM
> > and Power people happy because it involves adding a compare, a branch,
> > and an ISB/isync after every relaxed load... Me, I agree with you,
> > much preferring the no-store-speculation approach.
>
> Can you elaborate a bit on this please? We don't permit speculative stores
> in the ARM architecture, so it seems counter-intuitive that GCC needs to
> emit any additional instructions to prevent that from happening.
Requiring a compare/branch/ISB after each relaxed load enables a simple(r)
proof that out-of-thin-air values cannot be observed in the face of any
compiler optimization that refrains from reordering a prior relaxed load
with a subsequent relaxed store.
> Stores can, of course, be observed out-of-order but that's a lot more
> reasonable :)
So let me try an example. I am sure that Torvald Riegel will jump in
with any needed corrections or amplifications:
Initial state: x == y == 0
T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
atomic_store_explicit(r1, y, memory_order_relaxed);
T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);
One would intuitively expect r1 == r2 == 0 as the only possible outcome.
But suppose that the compiler used specialization optimizations, as it
would if there was a function that has a very lightweight implementation
for some values and a very heavyweight one for other. In particular,
suppose that the lightweight implementation was for the value 42.
Then the compiler might do something like the following:
Initial state: x == y == 0
T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
if (r1 == 42)
atomic_store_explicit(42, y, memory_order_relaxed);
else
atomic_store_explicit(r1, y, memory_order_relaxed);
T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);
Suddenly we have an explicit constant 42 showing up. Of course, if
the compiler carefully avoided speculative stores (as both Peter and
I believe that it should if its code generation is to be regarded as
anything other than an act of vandalism, the words in the standard
notwithstanding), there would be no problem. But currently, a number
of compiler writers see absolutely nothing wrong with transforming
the optimized-for-42 version above with something like this:
Initial state: x == y == 0
T1: r1 = atomic_load_explicit(x, memory_order_relaxed);
atomic_store_explicit(42, y, memory_order_relaxed);
if (r1 != 42)
atomic_store_explicit(r1, y, memory_order_relaxed);
T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);
And then it is a short and uncontroversial step to the following:
Initial state: x == y == 0
T1: atomic_store_explicit(42, y, memory_order_relaxed);
r1 = atomic_load_explicit(x, memory_order_relaxed);
if (r1 != 42)
atomic_store_explicit(r1, y, memory_order_relaxed);
T2: r2 = atomic_load_explicit(y, memory_order_relaxed);
atomic_store_explicit(r2, x, memory_order_relaxed);
This can of course result in r1 == r2 == 42, even though the constant
42 never appeared in the original code. This is one way to generate
an out-of-thin-air value.
As near as I can tell, compiler writers hate the idea of prohibiting
speculative-store optimizations because it requires them to introduce
both control and data dependency tracking into their compilers. Many of
them seem to hate dependency tracking with a purple passion. At least,
such a hatred would go a long way towards explaining the incomplete
and high-overhead implementations of memory_order_consume, the long
and successful use of idioms based on the memory_order_consume pattern
notwithstanding [*]. ;-)
That said, the Java guys are talking about introducing something
vaguely resembling memory_order_consume (and thus resembling the
rcu_assign_pointer() and rcu_dereference() portions of RCU) to solve Java
out-of-thin-air issues involving initialization, so perhaps there is hope.
Thanx, Paul
[*] http://queue.acm.org/detail.cfm?id=2488549
http://www.rdrop.com/users/paulmck/RCU/rclockpdcsproof.pdf
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/