Re: Compilers and RCU readers: Once more unto the breach!

From: Paul E. McKenney
Date: Fri May 22 2015 - 14:55:41 EST


On Fri, May 22, 2015 at 06:30:29PM +0100, Will Deacon wrote:
> Hi Paul,
>
> On Thu, May 21, 2015 at 09:02:12PM +0100, Paul E. McKenney wrote:
> > On Thu, May 21, 2015 at 08:24:22PM +0100, Will Deacon wrote:
> > > On Wed, May 20, 2015 at 07:16:06PM +0100, Paul E. McKenney wrote:
> > > > On to #5:
> > > >
> > > > r1 = atomic_load_explicit(&x, memory_order_consume);
> > > > if (r1 == 42)
> > > > atomic_store_explicit(&y, r1, memory_order_relaxed);
> > > > ----------------------------------------------------
> > > > r2 = atomic_load_explicit(&y, memory_order_consume);
> > > > if (r2 == 42)
> > > > atomic_store_explicit(&x, 42, memory_order_relaxed);
> > > >
> > > > The first thread's accesses are dependency ordered. The second thread's
> > > > ordering is in a corner case that memory-barriers.txt does not cover.
> > > > You are supposed to start control dependencies with READ_ONCE_CTRL(), not
> > > > a memory_order_consume load (AKA rcu_dereference and friends). However,
> > > > Alpha would have a full barrier as part of the memory_order_consume load,
> > > > and the rest of the processors would (one way or another) respect the
> > > > control dependency. And the compiler would have some fun trying to
> > > > break it.
> > >
> > > But this is interesting because the first thread is ordered whilst the
> > > second is not, so doesn't that effectively forbid the compiler from
> > > constant-folding values if it can't prove that there is no dependency
> > > chain?
> >
> > You lost me on this one. Are you suggesting that the compiler
> > speculate the second thread's atomic store? That would be very
> > bad regardless of dependency chains.
> >
> > So what constant-folding optimization are you thinking of here?
> > If the above example is not amenable to such an optimization, could
> > you please give me an example where constant folding would apply
> > in a way that is sensitive to dependency chains?
>
> Unless I'm missing something, I can't see what would prevent a compiler
> from looking at the code in thread1 and transforming it into the code in
> thread2 (i.e. constant folding r1 with 42 given that the taken branch
> must mean that r1 == 42). However, such an optimisation breaks the
> dependency chain, which means that a compiler needs to walk backwards
> to see if there is a dependency chain extending to r1.

Indeed! Which is one reason that (1) integers are not allowed in
dependency chains with a very few extremely constrained exceptions and
(2) sequences of comparisons and/or undefined-behavior considerations
that allow the compiler to exactly determine the pointer value break
the dependency chain.

> > > > So the current Linux memory model would allow (r1 == 42 && r2 == 42),
> > > > but I don't know of any hardware/compiler combination that would
> > > > allow it. And no, I am -not- going to update memory-barriers.txt for
> > > > this litmus test, its theoretical interest notwithstanding! ;-)
>
> Of course, I'm not asking for that at all! I'm just trying to see how
> your proposal holds up with the example.

Whew! ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/