Re: [tip:x86/asm] objtool: Track DRAP separately from callee-saved registers
From: hpa
Date: Fri Aug 11 2017 - 13:30:55 EST
On August 11, 2017 9:57:13 AM PDT, Josh Poimboeuf <jpoimboe@xxxxxxxxxx> wrote:
>On Fri, Aug 11, 2017 at 09:22:11AM -0700, Andy Lutomirski wrote:
>> On Fri, Aug 11, 2017 at 5:13 AM, tip-bot for Josh Poimboeuf
>> <tipbot@xxxxxxxxx> wrote:
>> > Commit-ID: bf4d1a83758368c842c94cab9661a75ca98bc848
>> > Gitweb:
>http://git.kernel.org/tip/bf4d1a83758368c842c94cab9661a75ca98bc848
>> > Author: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
>> > AuthorDate: Thu, 10 Aug 2017 16:37:26 -0500
>> > Committer: Ingo Molnar <mingo@xxxxxxxxxx>
>> > CommitDate: Fri, 11 Aug 2017 14:06:15 +0200
>> >
>> > objtool: Track DRAP separately from callee-saved registers
>> >
>> > When GCC realigns a function's stack, it sometimes uses %r13 as the
>DRAP
>> > register, like:
>> >
>> > push %r13
>> > lea 0x10(%rsp), %r13
>> > and $0xfffffffffffffff0, %rsp
>> > pushq -0x8(%r13)
>> > push %rbp
>> > mov %rsp, %rbp
>> > push %r13
>> > ...
>> > mov -0x8(%rbp),%r13
>> > leaveq
>> > lea -0x10(%r13), %rsp
>> > pop %r13
>> > retq
>> >
>>
>> I have a couple questions, mainly to help me understand.
>>
>> Question 1: What does DRAP stand for? Duplicate Return Address
>> Pointer? Dynamic ReAlignment Pointer? I tried searching and got
>> nothing.
>
>It seems to be a GCC invention which stands for:
>
> Dynamic Realign Argument Pointer.
>
>I don't think it's documented anywhere, but there's at least some
>comments about it in the GCC sources if you search for DRAP.
>
>> Question 2: What's up with the resulting stack layout? It seems we
>have:
>>
>> caller's last stack slot <-- r13 in function body points here
>> return address
>> old r13
>> [possible padding for alignment]
>> return address, duplicated (for naive unwinder's benefit?)
>> old rbp <-- rbp in body points here
>> new r13, i.e. pointer to caller's last stack slot
>>
>> Now we have the function body, and r13 is free for use in here
>because
>> it's saved.
>>
>> In the epilogue, we recover r13, use leaveq (hmm, shorter than pop
>> %rbp but does more work than needed), restore the old r13, and
>return.
>>
>> I don't get it, though. gcc only ever uses that inner r13 with an
>> offset. The code would be considerably shorter if the second
>> instruction were just mov %rsp, %r13. That would change the push to
>> pushq 0x8(%rsp) and the third-to-last instruction to mov %r13, %rsp,
>> saving something like 8 bytes of code.
>
>I don't know why it doesn't do it the way you suggest, but I'm glad it
>doesn't because I think it would make the DWARF/ORC data even more
>complicated. Here it's "simple", because r13 == DWARF CFA.
>
>> I also don't get why any of this is needed. Couldn't the compiler
>> just do push %rbp; mov %rsp, %rbp; and $0xfffffffffffffff0, %rsp and
>> be done with it?
>
>Good question. I wish it did just use the frame pointer, because
>dealing with DRAP has been a headache.
>
>> I compiled this:
>>
>> void func()
>> {
>> int var __attribute__((aligned(32)));
>> asm volatile ("" :: "m" (var));
>> }
>>
>> and got:
>>
>> func:
>> leaq 8(%rsp), %r10
>> andq $-32, %rsp
>> pushq -8(%r10)
>> pushq %rbp
>> movq %rsp, %rbp
>> pushq %r10
>> popq %r10
>> popq %rbp
>> leaq -8(%r10), %rsp
>> ret
>>
>> Which is better than the crud you pasted, since it at least uses a
>> caller-saved reg (r10), but we still have the nasty addressing modes
>> *and* an unnecessary push and pop of r10.
>>
>> I filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81825 and maybe
>> some GCC person has a clue what's going on.
>
>I've found that, when it does this DRAP pattern, most of the time it
>uses r10. The r13 version seems to be more rare. I can provide a
>real-world r13 example if that would help.
One could logically assume %r10 if a clobbered register is sufficient. It would make sense to do that renaming fairly late in the game.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.