Re: [GIT PULL] x86/build changes for v4.17
From: Linus Torvalds
Date: Wed Apr 04 2018 - 13:13:55 EST
On Wed, Apr 4, 2018 at 9:49 AM, Nick Desaulniers
<ndesaulniers@xxxxxxxxxx> wrote:
>
> It's definitely something curious that I'll need to sit down and investigate
> more. If there are other known instances, it would be good to let me know.
Note that we explicitly use "-fno-delete-null-pointer-checks" because
we do *not* want NULL pointer check removal even in the case of a bug.
See commit a3ca86aea507 ("Add '-fno-delete-null-pointer-checks' to gcc
CFLAGS") for the reason: we had buggy code that accessed a pointer
before the NULL pointer check, but the bug was "benign" as long as the
compiler didn't actually remove the check.
And note how the buggy code *looked* safe. It was doing the right
check, it was just that the check was hidden and disabled by another
bug.
Removing the NULL pointer check turned a benign bug into a trivially
exploitable one by just mapping user space data at NULL (which avoided
the kernel oops, and then made the kernel use the user value!).
Now, admittedly we have a ton of other security features in place to
avoid these kinds of issues - SMAP helps on the hardware side, and not
allowing users to mmap() NULL in the first place helps with most
distributions, but the point remains: the kernel generally really
doesn't want optimizations that are perhaps allowed by the standard,
but that result in code generation that doesn't match the source code.
The NULL pointer removal is one such thing: don't remove checks in the
kernel based on "standard says". It's ok to do optimizations that are
based on "hardware does the exact same thing", but not on "the
standard says this is undefined so we can remove it".
Other examples of where the kernel doesn't want the compiler to do
"the standard says this is undefined so I can take shortcuts" include:
-fno-strict-aliasing: the standard is just wrong and full of shit,
and the misguided type-based aliasing can cause serious problems for
the kernel (but also other code)
-fno-strict-overflow: again, this is a stupid optimization that
purely depends on the compiler generating faster code by generating
incorrect code.
The one I'm actually upset about is when a compiler goes even
*further* and does things that are NOT EVEN allowed by the paper
standard, much less by real code.
The fact that clang by default enables "-fmerge-all-constants"
behavior is just inexcusable. That's not just "let's do invalid
optimizations based on undefined behavior". That's an actual "let's do
known invalid optimizations that are explicitly disallowed even by the
standard".
Linus