Re: [PATCH v2 03/14] x86/retpoline: Simplify retpolines

From: Peter Zijlstra
Date: Mon Mar 22 2021 - 05:34:44 EST


On Fri, Mar 19, 2021 at 05:18:14PM +0000, David Laight wrote:
> From: Peter Zijlstra
> > Sent: 18 March 2021 17:11
> >
> > Due to commit c9c324dc22aa ("objtool: Support stack layout changes
> > in alternatives"), it is possible to simplify the retpolines.
> >
> ...
> > Notice that since the longest alternative sequence is now:
> >
> > 0: e8 07 00 00 00 callq c <.altinstr_replacement+0xc>
> > 5: f3 90 pause
> > 7: 0f ae e8 lfence
> > a: eb f9 jmp 5 <.altinstr_replacement+0x5>
> > c: 48 89 04 24 mov %rax,(%rsp)
> > 10: c3 retq
> >
> > 17 bytes, we have 15 bytes NOP at the end of our 32 byte slot. (IOW,
> > if we can shrink the retpoline by 1 byte we can pack it more dense)
>
> I'm intrigued about the lfence after the pause.
> Clearly this is for very warped cpu behaviour.
> To get to the pause you have to be speculating past an
> unconditional call.

Please read up on retpoline... That's the speculation trap. The warped
CPU behaviour is called Spectre-v2.

For others, the obvious alternative is the below; and possibly we could
then also remove the loop.

The original retpoline, as per Paul's article has: 1: pause; jmp 1b;.
That is, it lacks the LFENCE we have and would also fit 16 bytes.



---
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -15,8 +15,7 @@
call .Ldo_rop_\@
.Lspec_trap_\@:
UNWIND_HINT_EMPTY
- pause
- lfence
+ int3
jmp .Lspec_trap_\@
.Ldo_rop_\@:
mov %\reg, (%_ASM_SP)
@@ -27,7 +26,7 @@
.macro THUNK reg
.section .text.__x86.indirect_thunk

- .align 32
+ .align 16
SYM_FUNC_START(__x86_indirect_thunk_\reg)

ALTERNATIVE_2 __stringify(ANNOTATE_RETPOLINE_SAFE; jmp *%\reg), \