Re: [PATCH 13/14] x86: BHI stubs

From: Andrew Cooper
Date: Tue Oct 01 2024 - 07:20:16 EST


On 01/10/2024 12:03 pm, Peter Zijlstra wrote:
> On Mon, Sep 30, 2024 at 03:38:48PM -0700, Josh Poimboeuf wrote:
>> On Mon, Sep 30, 2024 at 11:23:38PM +0100, Andrew Cooper wrote:
>>> On 30/09/2024 10:30 pm, Josh Poimboeuf wrote:
>>>> On Fri, Sep 27, 2024 at 09:49:09PM +0200, Peter Zijlstra wrote:
>>>>> +SYM_INNER_LABEL(__bhi_args_0, SYM_L_LOCAL)
>>>>> + UNWIND_HINT_FUNC
>>>>> + cmovne %r10, %rdi
>>>> IIUC, this works because if the "jz" in the CFI preamble mispredicts to
>>>> the __bhi_args_* code, "cmovne" will zero out the speculative value of
>>>> rdi.
>>>>
>>>> Why use %r10 instead of a literal $0? Also how do you know %r10 is 0?
>>> There's no encoding for CMOVcc which takes an $imm.
>> Ah.
>>
>>> %r10 is guaranteed zero after the FineIBT prologue
>> If the "jz" in the FineIBT prologue mispredicts, isn't %r10 non-zero by
>> definition?
> Since I just wrote the comment...
>
> * FineIBT-BHI:
> *
> * __cfi_foo:
> * endbr64
> * subl 0x12345678, %r10d
> * jz foo-1
> * ud2
> * foo-1:
> * call __bhi_args_XXX
> * foo+4:
> * ... code here ...
> * ret
> *
> * direct caller:
> * call foo+4
> *
> * indirect caller:
> * lea foo(%rip), %r11
> * ...
> * movl $0x12345678, %r10d
> * subl $16, %r11

A compiler would normally spell this:

    lea foo-16(%rip), %r11

> * nop4
> * call *%r11
>
> And lets take a random bhi function:
>
> + .align 16
> +SYM_INNER_LABEL(__bhi_args_0_1, SYM_L_LOCAL)
> + UNWIND_HINT_FUNC
> + cmovne %r10, %rdi
> + cmovne %r10, %rsi
> + ANNOTATE_UNRET_SAFE
> + ret
> + int3
>
> So the case you worry about is SUBL does *not* result in 0, but we
> speculate JZ true and end up in CALL, and do CMOVne.
>
> Since we speculated Z, we must then also not do the CMOV, so the value
> of R10 is irrelevant, it will not be used. The thing however is that
> CMOV will unconditionally put a store dependency on the target register
> (RDI, RSI in the above sequence) and as such any further speculative
> code trying to use those registers will stall.

How does that help?

The write dependency doesn't stop a dependent load from executing in the
shadow of a mispredicted branch.

This is why SLH forces the pointer to 0 on the bad speculation path.

~Andrew