Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers
From: Arnd Bergmann
Date: Tue May 18 2021 - 11:42:18 EST
On Tue, May 18, 2021 at 4:56 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Tue, May 18, 2021 at 12:27 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
> > >
> > > I wonder if the kernel should do the same, or whether there are still cases
> > > where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
> > > it was fixed in gcc 6.
> >
> > It would have to be memmove(), not memcpy() in this case, right?
>
> No, it would simply be something like
>
> #define __get_unaligned_t(type, ptr) \
> ({ type __val; memcpy(&__val, ptr, sizeof(type)); __val; })
>
> #define get_unaligned(ptr) \
> __get_unaligned_t(typeof(*(ptr)), ptr)
>
> but honestly, the likelihood that the compiler generates something
> horrible (possibly because of KASAN etc) is uncomfortably high.
>
> I'd prefer the __packed thing. We don't actually use -O3, and it's
> considered a bad idea, and the gcc bug is as such less likely than
> just the above generating unacceptable code (we have several cases
> where "bad code generation" ends up being an actual bug, since we
> depend on inlining and depend on some code sequences not generating
> calls etc).
I think the important question is whether we know that the bug that Eric
pointed to can only happen with -O3, or whether it is something in
gcc-10.1 that got triggered by -O3 -msse on x86-64 but could equally
well get triggered on some other architecture without -O3 or
vector instructions enabled.
>From the gcc fix at
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9fa5b473b5b8e289b
it looks like this code path is entered when compiling with
-ftree-loop-vectorize, which is documented as
'-ftree-loop-vectorize'
Perform loop vectorization on trees. This flag is enabled by
default at '-O3' and by '-ftree-vectorize', '-fprofile-use', and
'-fauto-profile'.
-ftree-vectorize is set in arch/arm/lib/xor-neon.c
-O3 is set for the lz4 and zstd compression helpers and for wireguard.
To be on the safe side, we could pass -fno-tree-loop-vectorize along
with -O3 on the affected gcc versions, or use a bigger hammer
(not use -O3 at all, always set -fno-tree-loop-vectorize, ...).
Arnd