Re: [PATCH 00/22] add support for Clang LTO
From: Peter Zijlstra
Date: Tue Jun 30 2020 - 16:12:56 EST
On Tue, Jun 30, 2020 at 09:19:31PM +0200, Marco Elver wrote:
> I was asked for input on this, and after a few days digging through some
> history, thought I'd comment. Hope you don't mind.
Not at all, being the one that asked :-)
> First of all, I agree with the concerns, but not because of LTO.
>
> To set the stage better, and summarize the fundamental problem again:
> we're in the unfortunate situation that no compiler today has a way to
> _efficiently_ deal with C11's memory_order_consume
> [https://lwn.net/Articles/588300/]. If we did, we could just use that
> and be done with it. But, sadly, that doesn't seem possible right now --
> compilers just say consume==acquire.
I'm not convinced C11 memory_order_consume would actually work for us,
even if it would work. That is, given:
https://lore.kernel.org/lkml/20150520005510.GA23559@xxxxxxxxxxxxxxxxxx/
only pointers can have consume, but like I pointed out, we have code
that relies on dependent loads from integers.
> Will suggests doing the same in the
> kernel: https://lkml.kernel.org/r/20200630173734.14057-19-will@xxxxxxxxxx
PowerPC would need a similar thing, it too will not preserve causality
for control dependecies.
> What we're most worried about right now is the existence of compiler
> transformations that could break data dependencies by e.g. turning them
> into control dependencies.
Correct.
> If this is a real worry, I don't think LTO is the magical feature that
> will uncover those optimizations. If these compiler transformations are
> real, they also exist in a normal build!
Agreed, _however_ with the caveat that LTO could make them more common.
After all, with whole program analysis, the compiler might be able to
more easily determine that our pointer @ptr is only ever assigned the
values of &A, &B or &C, while without that visibility it would not be
able to determine this.
Once it knows @ptr has a limited number of determined values, the
conversion into control dependencies becomes much more likely.
> And if we are worried about them, we need to stop relying on dependent
> load ordering across the board; or switch to -O0 for everything.
> Clearly, we don't want either.
Agreed.
> Why do we think LTO is special?
As argued above, whole-program analysis would make it more likely. But I
agree the fundamental problem exists independent from LTO.
> But as far as we can tell, there is no evidence of the dreaded "data
> dependency to control dependency" conversion with LTO that isn't there
> in non-LTO builds, if it's even there at all. Has the data to control
> dependency conversion been encountered in the wild? If not, is the
> resulting reaction an overreaction? If so, we need to be careful blaming
> LTO for something that it isn't even guilty of.
It is mostly paranoia; in a large part driven by the fact that even if
such a conversion were to be done, it could go a very long time without
actually causing problems, and longer still for such problems to be
traced back to such an 'optimization'.
That is, the collective hurt from debugging too many ordering issues.
> So, we are probably better off untangling LTO from the story:
>
> 1. LTO or no LTO does not matter. The LTO series should not get tangled
> up with memory model issues.
>
> 2. The memory model question and problems need to be answered and
> addressed separately.
>
> Thoughts?
How hard would it be to creates something that analyzes a build and
looks for all 'dependent load -> control dependency' transformations
headed by a volatile (and/or from asm) load and issues a warning for
them?
This would give us an indication of how valuable this transformation is
for the kernel. I'm hoping/expecting it's vanishingly rare, but what do
I know.