Re: [PATCH v2 07/13] asm-generic: unaligned always use struct helpers

From: Arnd Bergmann
Date: Tue May 18 2021 - 03:27:11 EST


On Mon, May 17, 2021 at 11:53 PM Eric Biggers <ebiggers@xxxxxxxxxx> wrote:
> On Fri, May 14, 2021 at 12:00:55PM +0200, Arnd Bergmann wrote:
> > From: Arnd Bergmann <arnd@xxxxxxxx>
> >
> > As found by Vineet Gupta and Linus Torvalds, gcc has somewhat unexpected
> > behavior when faced with overlapping unaligned pointers. The kernel's
> > unaligned/access-ok.h header technically invokes undefined behavior
> > that happens to usually work on the architectures using it, but if the
> > compiler optimizes code based on the assumption that undefined behavior
> > doesn't happen, it can create output that actually causes data corruption.
> >
> > A related problem was previously found on 32-bit ARMv7, where most
> > instructions can be used on unaligned data, but 64-bit ldrd/strd causes
> > an exception. The workaround was to always use the unaligned/le_struct.h
> > helper instead of unaligned/access-ok.h, in commit 1cce91dfc8f7 ("ARM:
> > 8715/1: add a private asm/unaligned.h").
> >
> > The same solution should work on all other architectures as well, so
> > remove the access-ok.h variant and use the other one unconditionally on
> > all architectures, picking either the big-endian or little-endian version.
>
> FYI, gcc 10 had a bug where it miscompiled code that uses "packed structs" to
> copy between overlapping unaligned pointers
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94994).

Thank you for pointing this out

> I'm not sure whether the kernel will run into that or not, and gcc has since
> fixed it. But it's worth mentioning, especially since the issue mentioned in
> this commit sounds very similar (overlapping unaligned pointers), and both
> involved implementations of DEFLATE decompression.

I tried reproducing this on the kernel deflate code with the kernel.org gcc-10.1
and gcc-10.3 crosstool versions but couldn't quite get there with Vineet's
preprocessed source https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100363

Trying with both the original get_unaligned() version in there and the
packed-struct
variant, I get the same output from gcc-10.1 and gcc-10.3 when I compile those
myself for arc hs4x , but it's rather different from the output that Vineet got
and I don't know how to spot whether the problem exists in any of those
versions.

> Anyway, partly due to the above, in userspace I now only use memcpy() to
> implement {get,put}_unaligned_*, since these days it seems to be compiled
> optimally and have the least amount of problems.
>
> I wonder if the kernel should do the same, or whether there are still cases
> where memcpy() isn't compiled optimally. armv6/7 used to be one such case, but
> it was fixed in gcc 6.

It would have to be memmove(), not memcpy() in this case, right?
My feeling is that if gcc-4.9 and gcc-5 produce correct but slightly slower
code, we can live with that, unlike the possibility of gcc-10.{1,2} producing
incorrect code.

Since the new asm/unaligned.h has a single implementation across all
architectures, we could probably fall back to a memmove based version for
the compilers affected by the 94994 bug, but I'd first need to have a better
way to test regarding whether given combination of asm/unaligned.h and
compiler version runs into this bug.

I have checked your reproducer and confirmed that it does affect x86_64
gcc-10.1 -O3 with my proposed version of asm-generic/unaligned.h, but
does not trigger on any other version (4.9 though 9.3, 10.3 or 11.1), and not
on -O2 or "-O3 -mno-sse" builds or on arm64, but that doesn't necessarily
mean it's safe on these.

Arnd