Re: [PATCH 13/14] x86: BHI stubs

From: Andrew Cooper
Date: Thu Oct 03 2024 - 09:59:23 EST


On 03/10/2024 1:17 pm, Peter Zijlstra wrote:
> On Tue, Oct 01, 2024 at 12:20:02PM +0100, Andrew Cooper wrote:
>> On 01/10/2024 12:03 pm, Peter Zijlstra wrote:
>>> * nop4
>>> * call *%r11
>>>
>>> And lets take a random bhi function:
>>>
>>> + .align 16
>>> +SYM_INNER_LABEL(__bhi_args_0_1, SYM_L_LOCAL)
>>> + UNWIND_HINT_FUNC
>>> + cmovne %r10, %rdi
>>> + cmovne %r10, %rsi
>>> + ANNOTATE_UNRET_SAFE
>>> + ret
>>> + int3
>>>
>>> So the case you worry about is SUBL does *not* result in 0, but we
>>> speculate JZ true and end up in CALL, and do CMOVne.
>>>
>>> Since we speculated Z, we must then also not do the CMOV, so the value
>>> of R10 is irrelevant, it will not be used. The thing however is that
>>> CMOV will unconditionally put a store dependency on the target register
>>> (RDI, RSI in the above sequence) and as such any further speculative
>>> code trying to use those registers will stall.
>> How does that help?
>>
>> The write dependency doesn't stop a dependent load from executing in the
>> shadow of a mispredicted branch.
> I've been given to understand CMOVcc will kill any further speculation
> using the target register. So by 'poisoning' all argument registers that
> are involved with loads, we avoid any such load from happening during
> speculation.

IANAPA (I am not a pipeline architect), but AIUI,

CMOVcc establishes a data dependency between flags and the destination
register that doesn't exist in the pipeline if you'd used a conditional
branch instead.

It does prevent a dependent load from executing before the CMOVcc has
executed.  But it does not stop that load from executing speculatively
eventually.

So, given the following case:

* SUB is/will results nonzero (ZF=0, %r10=nonzero)
* JZ predicted taken, despite (ZF=0)

we call __bhi_args_XXX wherein:

* CMOVNZ blocks until SUB executes (flags dependency)
* CMOVNZ eventually executes, and because ZF=0, it really does write
%r10 over the target registers

and then we enter the function with all pointers containing the nonzero
residual from the hash check.

Now, because it's a SUBL, the result is < 2^32, a straight deference of
one of these pointers will be blocked by SMAP (noone cares about 32bit,
or pre-SMAP hardware, right?)

Forward references from the pointers will be safe (assuming SIB doesn't
reach the canonical boundary), but backward references may wrap around
back into the kernel space.  These will not be blocked by SMAP and will
spill their secrets if suitably provoked.

~Andrew