Re: Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness

From: Ard Biesheuvel
Date: Wed Oct 19 2016 - 11:00:02 EST


On 19 October 2016 at 14:35, Will Deacon <will.deacon@xxxxxxx> wrote:
> Hi Ard,
>
> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
>> On 17 October 2016 at 19:38, Will Deacon <will.deacon@xxxxxxx> wrote:
>> > I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
>> > believe that the new compiler behaviour at the heart of the problem
>> > has the potential to affect other architectures and other pieces of
>> > kernel code relying on dead-code elimination to remove deliberately
>> > undefined functions.
>> >
>> > The failure looks like:
>> >
>> > | drivers/built-in.o: In function `armada_3700_add_composite_clk':
>> > |
>> > | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
>> > | undefined reference to `____ilog2_NaN'
>> > |
>> > | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
>> > | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
>> > | `____ilog2_NaN'
>> > |
>> > | make: *** [vmlinux] Error 1
>> >
>> > and if we look at the source for armada_3700_add_composite_clk, we see
>> > that this is caused by:
>> >
>> > int table_size = 0;
>> >
>> > rate->reg = reg + (u64)rate->reg;
>> > for (clkt = rate->table; clkt->div; clkt++)
>> > table_size++;
>> > rate->width = order_base_2(table_size);
>> >
>> > order_base_2 calls ilog2, which has the ____ilog2_NaN call:
>> >
>> > #define ilog2(n) \
>> > ( \
>> > __builtin_constant_p(n) ? ( \
>> > (n) < 1 ? ____ilog2_NaN() : \
>> >
>> > This is because we're in a curious case where GCC has emitted a
>> > special-cased version of armada_3700_add_composite_clk, with table_size
>> > effectively constant-folded as 0. Whilst we shouldn't see this in a
>> > non-buggy kernel (hence the deliberate call to the undefined function
>> > ____ilog2_NaN), it means that the final link fails because we have a
>> > ____ilog2_NaN in the code, with a runtime check on table_size.
>> >
>>
>> This is indeed an unintended side effect, but I would not call it
>> weird behaviour at all. The code in its current form does not handle
>> the case where it could end up passing 0 into order_base_2(), and we
>> simply need to handle that case.
>
> The reasons I think it's weird are:
>
> (1) The optimisation doesn't generate better code in this case --
> optimising for the table_size == 0 case is uninformed, particularly
> as that *cannot* happen at runtime (GCC probably can't tell, due
> to things like container_of, but all the clock data is static).
>

AFAICT, the references to the static clock data are indirected via
of_device_get_match_data(), which means there is no way the compiler
can prove that table_size is always non-zero.

> (2) __builtin_constant_p(n) could be interpreted by a developer as
> "this code will execute with a constant n at runtime". With this
> issue, GCC could (in theory) generate a specialisation for every
> possible value of a variable, and return __builtin_constant_p as
> true for all of them, which somewhat undermines the point of the
> builtin.
>

Yes, and that would be perfectly legal from a correctness point of
view, and would likely help performance as well. By using
__builtin_constant_p(), you are choosing to perform a build time
evaluation of an expression that would ordinarily be evaluated only at
runtime. This implies that you have to address undefined behavior at
build time rather than at runtime as well.

>> If order_base_2() is not defined for input 0, it should BUG() in that
>> case, and the associated __builtin_unreachable() should prevent the
>> special version from being emitted. If order_base_2() is defined for input
>> 0, it should not invoke ilog2() with that argument, and the problem should
>> go away as well.
>
> I don't necessarily think it should BUG() if it's not defined for input
> 0; things like __ffs don't do that and we'd be introducing conditional
> checks for cases that should not happen. The comment above order_base_2
> does suggest that ob2(0) should return 0, but it can actually end up
> invoking ilog2(-1), which is obviously wrong.
>
> I could update the comment, but that doesn't fix the build issue.
>

Fixing roundup_pow_of_two() [which is arguably incorrect] would
probably fix the build issue as well, no?

diff --git a/include/linux/log2.h b/include/linux/log2.h
index fd7ff3d91e6a..8a4be5e4223b 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -168,7 +168,7 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
#define roundup_pow_of_two(n) \
( \
__builtin_constant_p(n) ? ( \
- (n == 1) ? 1 : \
+ (n <= 1) ? 1 : \
(1UL << (ilog2((n) - 1) + 1)) \
) : \
__roundup_pow_of_two(n) \