Re: [RFC PATCH] x86/retpolines: Prevent speculation after RET

From: Andrew Cooper
Date: Fri Feb 19 2021 - 07:10:00 EST


On 19/02/2021 08:15, Peter Zijlstra wrote:
> On Thu, Feb 18, 2021 at 08:11:38PM +0100, Borislav Petkov wrote:
>> On Thu, Feb 18, 2021 at 08:02:31PM +0100, Peter Zijlstra wrote:
>>> On Thu, Feb 18, 2021 at 07:46:39PM +0100, Borislav Petkov wrote:
>>>> Both vendors speculate after a near RET in some way:
>>>>
>>>> Intel:
>>>>
>>>> "Unlike near indirect CALL and near indirect JMP, the processor will not
>>>> speculatively execute the next sequential instruction after a near RET
>>>> unless that instruction is also the target of a jump or is a target in a
>>>> branch predictor."
>>> Right, the way I read that means it's not a problem for us here.
>> Look at that other thread: the instruction *after* the RET can be
>> speculatively executed if that instruction is the target of a jump or it
>> is in a branch predictor.
> Right, but that has nothing to do with the RET instruction itself. You
> can speculatively execute any random instruction by training the BTB,
> which is I suppose the entire point of things :-)
>
> So the way I read it is that: RET does not 'leak' speculation, but if
> you target the instruction after RET with any other speculation crud,
> ofcourse you can get it to 'run'.
>
> And until further clarified, I'll stick with that :-)

https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf
Final page, Mitigation G-5

Some parts (before Milan I believe that CPUID rule translates into) may
speculatively execute the instructions sequentially following a call/jmp
indirect or ret instruction.

For Intel, its just call/jmp instructions.  From SDM Vol2 for CALL (and
similar for JMP)

"Certain situations may lead to the next sequential instruction after a
near indirect CALL being speculatively executed. If software needs to
prevent this (e.g., in order to prevent a speculative execution side
channel), then an LFENCE instruction opcode can be placed after the near
indirect CALL in order to block speculative execution."


In both cases, the reason LFENCE is given is for the CALL case, where
there is sequential architectural execution.  JMP and RET do not have
architectural execution following them, so can use a shorter speculation
blocker.

When compiling with retpoline, all CALL/JMP indirects are removed, other
than within the __x86_indirect_thunk_%reg blocks, and those can be fixed
by hand.  That just leaves RET speculation, which has no following
architectural execution, at which point `ret; int3` is the shortest way
of halting speculation, at half the size of `ret; lfence`.

With a gcc toolchain, it does actually work if you macro 'ret' (and
retl/q) to be .byte 0xc3, 0xcc, but this doesn't work for Clang IAS
which refuses to macro real instructions.

What would be massively helpful if is the toolchains could have their
existing ARM straight-line-speculation support hooked up appropriately
so we get some new code gen options on x86, and don't have to resort to
the macro bodges above.

~Andrew