Re: [PATCH 0/7] ARM: hacks for link-time optimization

From: Paul E. McKenney
Date: Fri Dec 21 2018 - 09:23:28 EST


On Tue, Dec 18, 2018 at 11:00:14AM +0100, Peter Zijlstra wrote:
> On Tue, Dec 18, 2018 at 10:18:24AM +0100, Peter Zijlstra wrote:
> > In particular turning an address-dependency into a control-dependency,
> > which is something allowed by the C language, since it doesn't recognise
> > these concepts as such.
> >
> > The 'optimization' is allowed currently, but LTO will make it much more
> > likely since it will have a much wider view of things. Esp. when combined
> > with PGO.
> >
> > Specifically; if you have something like:
> >
> > int idx;
> > struct object objs[2];
> >
> > the statement:
> >
> > val = objs[idx & 1].ponies;
> >
> > which you 'need' to be translated like:
> >
> > struct object *obj = objs;
> > obj += (idx & 1);
> > val = obj->ponies;
> >
> > Such that the load of obj->ponies depends on the load of idx. However
> > our dear compiler is allowed to make it:
> >
> > if (idx & 1)
> > obj = &objs[1];
> > else
> > obj = &objs[0];
> >
> > val = obj->ponies;
> >
> > Because C doesn't recognise this as being different. However this is
> > utterly broken, because in this translation we can speculate the load
> > of obj->ponies such that it no longer depends on the load of idx, which
> > breaks RCU.

Hence the following in Documentation/RCU/rcu_dereference.txt:

You are only permitted to use rcu_dereference on pointer values.
The compiler simply knows too much about integral values to
trust it to carry dependencies through integer operations.

I got rid of the carrying of dependencies via non-pointers in 2014.
You are telling me that they have crept back? Sigh!!! :-/

Thanx, Paul

> > Note that further 'optimization' is possible and the compiler could even
> > make it:
> >
> > if (idx & 1)
> > val = objs[1].ponies;
> > else
> > val = objs[0].ponies;
>
> A variant that is actually broken on x86 too (due to issuing the loads
> in the 'wrong' order):
>
> val = objs[0].ponies;
> if (idx & 1)
> val = objs[1].ponies;
>
> Which is a translation that makes sense if we either marked
> unlikely(idx & 1) or if PGO found the same.
>
> > Now, granted, this is a fairly artificial example, but it does
> > illustrate the exact problem.
> >
> > The more the compiler can see of the complete program, the more likely
> > it can make inferrences like this, esp. when coupled with PGO.
> >
> > Now, we're (usually) very careful to wrap things in READ_ONCE() and
> > rcu_dereference() and the like, which makes it harder on the compiler
> > (because 'volatile' is special), but nothing really stops it from doing
> > this.
> >
> > Paul has been trying to beat clue into the language people, but given
> > he's been at it for 10 years now, and there's no resolution, I figure we
> > ought to get compiler implementations to give us a knob.
>