Re: [tip:x86/asm] objtool: Track DRAP separately from callee-saved registers
From: Josh Poimboeuf
Date: Fri Aug 11 2017 - 12:57:21 EST
On Fri, Aug 11, 2017 at 09:22:11AM -0700, Andy Lutomirski wrote:
> On Fri, Aug 11, 2017 at 5:13 AM, tip-bot for Josh Poimboeuf
> <tipbot@xxxxxxxxx> wrote:
> > Commit-ID: bf4d1a83758368c842c94cab9661a75ca98bc848
> > Gitweb: http://git.kernel.org/tip/bf4d1a83758368c842c94cab9661a75ca98bc848
> > Author: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
> > AuthorDate: Thu, 10 Aug 2017 16:37:26 -0500
> > Committer: Ingo Molnar <mingo@xxxxxxxxxx>
> > CommitDate: Fri, 11 Aug 2017 14:06:15 +0200
> >
> > objtool: Track DRAP separately from callee-saved registers
> >
> > When GCC realigns a function's stack, it sometimes uses %r13 as the DRAP
> > register, like:
> >
> > push %r13
> > lea 0x10(%rsp), %r13
> > and $0xfffffffffffffff0, %rsp
> > pushq -0x8(%r13)
> > push %rbp
> > mov %rsp, %rbp
> > push %r13
> > ...
> > mov -0x8(%rbp),%r13
> > leaveq
> > lea -0x10(%r13), %rsp
> > pop %r13
> > retq
> >
>
> I have a couple questions, mainly to help me understand.
>
> Question 1: What does DRAP stand for? Duplicate Return Address
> Pointer? Dynamic ReAlignment Pointer? I tried searching and got
> nothing.
It seems to be a GCC invention which stands for:
Dynamic Realign Argument Pointer.
I don't think it's documented anywhere, but there's at least some
comments about it in the GCC sources if you search for DRAP.
> Question 2: What's up with the resulting stack layout? It seems we have:
>
> caller's last stack slot <-- r13 in function body points here
> return address
> old r13
> [possible padding for alignment]
> return address, duplicated (for naive unwinder's benefit?)
> old rbp <-- rbp in body points here
> new r13, i.e. pointer to caller's last stack slot
>
> Now we have the function body, and r13 is free for use in here because
> it's saved.
>
> In the epilogue, we recover r13, use leaveq (hmm, shorter than pop
> %rbp but does more work than needed), restore the old r13, and return.
>
> I don't get it, though. gcc only ever uses that inner r13 with an
> offset. The code would be considerably shorter if the second
> instruction were just mov %rsp, %r13. That would change the push to
> pushq 0x8(%rsp) and the third-to-last instruction to mov %r13, %rsp,
> saving something like 8 bytes of code.
I don't know why it doesn't do it the way you suggest, but I'm glad it
doesn't because I think it would make the DWARF/ORC data even more
complicated. Here it's "simple", because r13 == DWARF CFA.
> I also don't get why any of this is needed. Couldn't the compiler
> just do push %rbp; mov %rsp, %rbp; and $0xfffffffffffffff0, %rsp and
> be done with it?
Good question. I wish it did just use the frame pointer, because
dealing with DRAP has been a headache.
> I compiled this:
>
> void func()
> {
> int var __attribute__((aligned(32)));
> asm volatile ("" :: "m" (var));
> }
>
> and got:
>
> func:
> leaq 8(%rsp), %r10
> andq $-32, %rsp
> pushq -8(%r10)
> pushq %rbp
> movq %rsp, %rbp
> pushq %r10
> popq %r10
> popq %rbp
> leaq -8(%r10), %rsp
> ret
>
> Which is better than the crud you pasted, since it at least uses a
> caller-saved reg (r10), but we still have the nasty addressing modes
> *and* an unnecessary push and pop of r10.
>
> I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81825 and maybe
> some GCC person has a clue what's going on.
I've found that, when it does this DRAP pattern, most of the time it
uses r10. The r13 version seems to be more rare. I can provide a
real-world r13 example if that would help.
--
Josh