Re: [PATCH] x86/asm/entry: fix stack return address retrieval in thunk

From: Linus Torvalds
Date: Tue May 17 2016 - 12:31:24 EST


On Tue, May 17, 2016 at 7:43 AM, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>
> index 98df1fa..dae7ca0 100644
> --- a/arch/x86/entry/thunk_64.S
> +++ b/arch/x86/entry/thunk_64.S
> @@ -15,9 +15,10 @@
> .globl \name
> .type \name, @function
> \name:
> + /* push 1 register if frame pointers are enabled */
> FRAME_BEGIN
>
> - /* this one pushes 9 elems, the next one would be %rIP */
> + /* push 9 registers */

I don't hate this patch, but quite frankly, as with the other case,
I'd just make the frame pointer be unconditional in this case.

If we push nine other registers, the frame pointer setup code is *not*
going to matter.

The reason to avoid frame pointers in code generation is two-fold:

1) for small leaf functions, it often ends up dominating

2) it removes a register that is otherwise usable, which can be
particularly bad on 32-bit x86 due to the much more limited number of
registers (and was apparently really noticeable on the older on-order
atom cores)

and in this case neither of them is really an issue.

So I would suggest that any case that actually depends on a frame
access just make the frame pointer not just unconditional, but
_explicit_.

So not just avoiding the macro because it's conditional, but write out
the sequence to actually set up the frame, and then use

- movq 9*8(%rsp), %rdi
+ movq 8(%rbp), %rdi # return address

to entirely avoid all kind of "how many registers have we pushed" math.

Considering that we got this wrong in two places, it's clearly too
subtle for our little brains as-is.

Linus