HAVE_EFFICIENT_UNALIGNED_ACCESS on ARM32 (was: Alignment issues in zImage with Linux 4.12, LZ4 and GCC5.3)

From: Romain Izard
Date: Mon Sep 04 2017 - 12:19:50 EST


2017-07-24 13:07 GMT+02:00 Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>:
> On 24 July 2017 at 11:57, Romain Izard <romain.izard.pro@xxxxxxxxx> wrote:
>>
>> While upgrading the kernel from 4.9 to 4.12 for a custom board with a
>> Cortex-A5 based CPU, I have encountered a compilation issue that leads to
>> a data abort during the execution of the LZ4 decompression code in
>> zImage.
>>
>> [...]
>>
>> The compilation options are a little different between both cases:
>> The library is built with -O3, whereas the zImage decompressor is built
>> with -O2, -DDISABLE_BRANCH_PROFILING, -fpic, -mno-single-pic-base,
>> -fno-builtin. All other compilation options are shared in both cases.
>>

This is a red herring: the critical option here is '-fno-builtin'. If it is
not set, the bug disappears. It also disappears if we replace it with
'-fno-builtin-putc'. But it only changes the optimizations applied by
the compiler itself, and cannot explain the issue.

Before updating the LZ4 decompressor, the LZ4 header contained specific
code for handling alignment issues, which has been changed.

>> For Linux 4.9, the LZ4 decompressor code is completely different, which
>> explains why the issue appeared when changing kernel versions.
>>
>
> I see some void* to u32* casts in the new code, which makes me think
> that it is perhaps not valid C, and has maybe not been tested on an
> architecture that has stricter alignment requirements than x86?
>

I can reproduce it easily on v4.13 with GCC6.3:
- Configure with allnoconfig
- Enable CONFIG_MMU, CONFIG_KERNEL_LZ4
- Check the generated assembly for arch/arm/boot/compressed/decompress.o:
In the LZ4_decompress_fast function, the memory access after the third
branch uses ldm and stm. This is invalid, as the addresses can be unaligned.

With this configuration, HAVE_EFFICIENT_UNALIGNED_ACCESS is set, but this is
wrong. On 32-bit ARM, the compiler is free to generate LDM or LDRD access
that will always fail on unaligned addresses. In this case, we have two
LDR/STR access to adjascent addresses that appear in inline code. The
get_unaligned functions in "include/linux/unaligned/access_ok.h" cast the
pointers directly as regular 32-bit access, and as those are by default
aligned, the compiler will optimise and combine the access.

If we use the functions from "include/linux/unaligned/le_struct.h", the
get_unaligned() function correctly tells the compiler that the access is
special, and that it should not merge memory access. But we do not fall back
to byte-by-byte access, as the compiler itself knows how to use 32-bit
access when -funaligned-access is set (by default for ARMv7).

The issue is probably hidden by the kernel fault handler in normal kernel
code, but for this case it does nothing as we are working in the boot
decompressor, that cannot use the fault handler. But it should have a
performance inpact.

As a result, this means that HAVE_EFFICIENT_UNALIGNED_ACCESS should not
be set at least in the context of "include/asm-generic/unaligned.h". But
as this option is also used in other places, where it is not related to
the get_unaligned functions, it is not possible to remove it on ARM 32-bit
without further study.

--
Romain Izard