Re: objtool - what if I want to clobber rbp?

From: Josh Poimboeuf
Date: Tue Nov 21 2017 - 22:31:15 EST

On Tue, Nov 21, 2017 at 10:55:23PM +0100, Jason A. Donenfeld wrote:
> Hi Josh,
> We're working on some highly optimized assembly crypto primitive
> implementations for WireGuard. The last 24 hours have been spent
> trying to make objtool happy with a variety of tricks, some more
> unfortunate than others. There's still one issue remaining, however,
> and I just can't figure out how to make it go away:
> poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx uses BP as a
> scratch register
> poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx2 uses BP as a
> scratch register
> poly1305-x86_64.o: warning: objtool: poly1305_blocks_avx512 uses BP as
> a scratch register
> The messages are right. We're using %rbp as a general purpose
> register, writing into it all sorts of crypto gibberish certainly not
> suitable for walking stack frames. It's hard to find a way of not
> using it, without incurring a speed penalty. We really do just need
> all the registers.
> Of course the "problem" goes away with the new slick ORC unwinding,
> which is great. But for frame pointer unwinding, this problem remains.
> I'm wondering if you can think of any clever way of marking a function
> or the like that will make this issue go away, somehow. Is there any
> path forward without sacrificing %rbp and hence performance to a
> useless frame pointer?

Hi Jason,

Unfortunately I don't have an easy answer.

The problem is that using %rbp as a scratch register isn't compatible
with CONFIG_FRAME_POINTER. GCC doesn't do it, even on leaf functions,
and we also enforce that rule on asm code. If your function gets
interrupted, and the interrupt handler needs to dump the stack, the
unwinder will get confused by the %rbp value and the rest of the unwind
will fail.

So, a few ideas:

- Make your feature conflict with CONFIG_FRAME_POINTER on x86_64. The
ORC unwinder is now the default anyway for 4.15, and we renamed the
configs, so most people will be actively switching to ORC.

- Add some ifdefs so your code only uses %rbp as a scratch register when

- If one of the registers is used less often than the others, you could
spill it to the stack. I know you said you need all the registers,
but I'd be willing to review the code for ideas, if that would help.
Sometimes it helps to get fresh eyes on the problem. We were able to
fix this problem with all the existing crypto code without affecting
performance measurably. We had to get creative with a few of those.

BTW, since CONFIG_FRAME_POINTER is no longer the default and is becoming
deprecated, there has been some talk of disabling objtool with
CONFIG_FRAME_POINTER. That would make your life easier. However I
think we're not quite ready for that, so it might be a few more release
cycles before that happens.