Re: [PATCH 0/7] ARM: hacks for link-time optimization
From: Paul E. McKenney
Date: Fri Dec 21 2018 - 13:00:09 EST
On Fri, Dec 21, 2018 at 09:20:44AM -0800, Andi Kleen wrote:
> > In particular turning an address-dependency into a control-dependency,
> > which is something allowed by the C language, since it doesn't recognise
> > these concepts as such.
> >
> > The 'optimization' is allowed currently, but LTO will make it much more
> > likely since it will have a much wider view of things. Esp. when combined
> > with PGO.
> >
> > Specifically; if you have something like:
> >
> > int idx;
> > struct object objs[2];
> >
> > the statement:
> >
> > val = objs[idx & 1].ponies;
> >
> > which you 'need' to be translated like:
> >
> > struct object *obj = objs;
> > obj += (idx & 1);
> > val = obj->ponies;
> >
> > Such that the load of obj->ponies depends on the load of idx. However
> > our dear compiler is allowed to make it:
> >
> > if (idx & 1)
> > obj = &objs[1];
> > else
> > obj = &objs[0];
> >
> > val = obj->ponies;
>
> I don't see why a compiler would do such an optimization. Clearly
> the second variant is worse than the first, bigger and needs
> branch prediction resources.
>
> In fact compilers usually try hard to go into the other direction
> and apply if conversion.
>
> Has anyone seen real world examples of such changes being done, or is this
> all language lawyering theory?
I have not seen it myself, but I have heard others claim to. For example,
if "idx & 1" had to be computed for some other reason, especially if there
was a pre-exiting "if" statement with this as its condition. Or if you
have hardware that has a conditional-move instruction. And so on.
Do you have a way to guarantee that it never happens?
Thanx, Paul
> -Andi
>
> >
> > Because C doesn't recognise this as being different. However this is
> > utterly broken, because in this translation we can speculate the load
> > of obj->ponies such that it no longer depends on the load of idx, which
> > breaks RCU.
> >
> > Note that further 'optimization' is possible and the compiler could even
> > make it:
> >
> > if (idx & 1)
> > val = objs[1].ponies;
> > else
> > val = objs[0].ponies;
> >
> > Now, granted, this is a fairly artificial example, but it does
> > illustrate the exact problem.
> >
> > The more the compiler can see of the complete program, the more likely
> > it can make inferrences like this, esp. when coupled with PGO.
> >
> > Now, we're (usually) very careful to wrap things in READ_ONCE() and
> > rcu_dereference() and the like, which makes it harder on the compiler
> > (because 'volatile' is special), but nothing really stops it from doing
> > this.
> >
> > Paul has been trying to beat clue into the language people, but given
> > he's been at it for 10 years now, and there's no resolution, I figure we
> > ought to get compiler implementations to give us a knob.
>