Re: gcc-10: kernel stack is corrupted and fails to boot

From: Nick Desaulniers
Date: Wed May 13 2020 - 20:51:14 EST


On Wed, May 13, 2020 at 5:11 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Wed, May 13, 2020 at 4:36 PM Borislav Petkov <bp@xxxxxxx> wrote:
> >
> >
> > Looking at them, they do have an mb() too so how about this then
> > instead?
> >
> > #define prevent_tail_call_optimization() mb()
>
> Yeah, I think a full mb() is likely safe, because that's pretty much
> always going to be a real instruction with real semantics, and no
> amount of link-time optimizations can move it around a call
> instruction.

Are you sure LTO treats empty asm statements differently than full
memory barriers in regards to preventing tail calls? (I'll take your
word for it, I don't actually know, but seeing an example of real code
run through a production compiler is much much more convincing).

The TL;DR of the very long thread is that
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94722 is a proper fix, on
the GCC side. Adding arbitrary empty asm statements to work around
it? Hacks. Full memory barriers? Hacks.

I'm happy that GCC does an optimization that Clang does not. At the
same time, it sucks to pay a penalty for a bug we don't trigger. This
is the same reason why `asm_volatile_goto` expands differently between
GCC and Clang (and why I tried to undo that like a year ago).

If Clang realizes the same optimization GCC is doing here (related to
tailcalls) tomorrow, well we already support
__attribute__((no_stack_protector)) which can be added to the callees
we don't want tail called in this case (i.e. allowing tail calls). I
should send a patch adding that to include/linux/compiler_attributes.h
and annotate the callees in question, before we forget about this
issue.

Sprinkling empty asm statements or full memory barriers should be
treated with the same hesitancy as adding sleep()s to "work around"
concurrency bugs. Red flag.

And LTO is fun; we've been shipping it in Android for years (and need
to attempt upstreaming again). Just today we found an ODR violation
in one of the most important symbols in the kernel. Will be sending a
patch for that tomorrow.

>
> I could imagine some completely UP in-order CPU that doesn't need to
> serialize with anything at all, and even "mb()" might be empty. I
> think you can compile old ARM kernels for that. But realistically I
> think we can ignore them at least for now - I'm not sure the link-time
> optimization will even do things like that tailcall conversion, and
> I'm not convinced that old pre-ARMv7 systems will be relevant by the
> time (if) it ever does.
>
> Linus



--
Thanks,
~Nick Desaulniers