Re: [PATCH 13/14] x86: BHI stubs
From: Peter Zijlstra
Date: Thu Oct 03 2024 - 08:19:04 EST
On Tue, Oct 01, 2024 at 12:20:02PM +0100, Andrew Cooper wrote:
> On 01/10/2024 12:03 pm, Peter Zijlstra wrote:
> > On Mon, Sep 30, 2024 at 03:38:48PM -0700, Josh Poimboeuf wrote:
> >> On Mon, Sep 30, 2024 at 11:23:38PM +0100, Andrew Cooper wrote:
> >>> On 30/09/2024 10:30 pm, Josh Poimboeuf wrote:
> >>>> On Fri, Sep 27, 2024 at 09:49:09PM +0200, Peter Zijlstra wrote:
> >>>>> +SYM_INNER_LABEL(__bhi_args_0, SYM_L_LOCAL)
> >>>>> + UNWIND_HINT_FUNC
> >>>>> + cmovne %r10, %rdi
> >>>> IIUC, this works because if the "jz" in the CFI preamble mispredicts to
> >>>> the __bhi_args_* code, "cmovne" will zero out the speculative value of
> >>>> rdi.
> >>>>
> >>>> Why use %r10 instead of a literal $0? Also how do you know %r10 is 0?
> >>> There's no encoding for CMOVcc which takes an $imm.
> >> Ah.
> >>
> >>> %r10 is guaranteed zero after the FineIBT prologue
> >> If the "jz" in the FineIBT prologue mispredicts, isn't %r10 non-zero by
> >> definition?
> > Since I just wrote the comment...
> >
> > * FineIBT-BHI:
> > *
> > * __cfi_foo:
> > * endbr64
> > * subl 0x12345678, %r10d
> > * jz foo-1
> > * ud2
> > * foo-1:
> > * call __bhi_args_XXX
> > * foo+4:
> > * ... code here ...
> > * ret
> > *
> > * direct caller:
> > * call foo+4
> > *
> > * indirect caller:
> > * lea foo(%rip), %r11
> > * ...
> > * movl $0x12345678, %r10d
> > * subl $16, %r11
>
> A compiler would normally spell this:
>
> lea foo-16(%rip), %r11
Yeah, but the original lea is outside of the code we control, the kcfi
caller section we're rewriting gets to have %r11 as is.
> > * nop4
> > * call *%r11
> >
> > And lets take a random bhi function:
> >
> > + .align 16
> > +SYM_INNER_LABEL(__bhi_args_0_1, SYM_L_LOCAL)
> > + UNWIND_HINT_FUNC
> > + cmovne %r10, %rdi
> > + cmovne %r10, %rsi
> > + ANNOTATE_UNRET_SAFE
> > + ret
> > + int3
> >
> > So the case you worry about is SUBL does *not* result in 0, but we
> > speculate JZ true and end up in CALL, and do CMOVne.
> >
> > Since we speculated Z, we must then also not do the CMOV, so the value
> > of R10 is irrelevant, it will not be used. The thing however is that
> > CMOV will unconditionally put a store dependency on the target register
> > (RDI, RSI in the above sequence) and as such any further speculative
> > code trying to use those registers will stall.
>
> How does that help?
>
> The write dependency doesn't stop a dependent load from executing in the
> shadow of a mispredicted branch.
I've been given to understand CMOVcc will kill any further speculation
using the target register. So by 'poisoning' all argument registers that
are involved with loads, we avoid any such load from happening during
speculation.