RE: [PATCH 13/14] x86: BHI stubs

From: Constable, Scott D
Date: Mon Oct 14 2024 - 13:50:49 EST


Hello Andrew,

Your observation is valid. If we assume that the hashing function used by FineIBT is uniformly distributed, then the distribution of hashes at the call site and at the call target is [0,2^32-1]. The difference of the two hashes computed in R10 will have the same distribution because of wrap-around, and the mean of this distribution is 2^31-1. Therefore, to reasonably bypass the proposed mitigation, I believe an attacker would need the hardened pointer to be added/subtracted to/from an attacker-controlled 64-bit value, or an attacker-controlled 32-bit value scaled by 2, 4, or 8. Therefore, I think it would be reasonable to additionally apply the CMOV hardening to any 32-/64-bit integral parameters, including enums. I scanned the kernel (Ubuntu noble 6.8 config) and found that 77% of parameters to indirect call targets are pointers (which we already harden) and less than 20% are 32-/64-bit integrals and enums.

I think that this proposal would also address some other potential corner cases, such as:
- an attacker-controlled 32-/64-bit attacker-controlled integral parameter is used to index into a fixed-address array
- an attacker-controlled 64-bit attacker-controlled integral parameter is cast into a pointer

Does this proposal address your concern?

Thanks and regards,

Scott Constable

>On 03/10/2024 1:17 pm, Peter Zijlstra wrote:
>> On Tue, Oct 01, 2024 at 12:20:02PM +0100, Andrew Cooper wrote:
>>> On 01/10/2024 12:03 pm, Peter Zijlstra wrote:
>>>> * nop4
>>>> * call *%r11
>>>>
>>>> And lets take a random bhi function:
>>>>
>>>> + .align 16
>>>> +SYM_INNER_LABEL(__bhi_args_0_1, SYM_L_LOCAL)
>>>> + UNWIND_HINT_FUNC
>>>> + cmovne %r10, %rdi
>>>> + cmovne %r10, %rsi
>>>> + ANNOTATE_UNRET_SAFE
>>>> + ret
>>>> + int3
>>>>
>>>> So the case you worry about is SUBL does *not* result in 0, but we
>>>> speculate JZ true and end up in CALL, and do CMOVne.
>>>>
>>>> Since we speculated Z, we must then also not do the CMOV, so the
>>>> value of R10 is irrelevant, it will not be used. The thing however
>>>> is that CMOV will unconditionally put a store dependency on the
>>>> target register (RDI, RSI in the above sequence) and as such any
>>>> further speculative code trying to use those registers will stall.
>>> How does that help?
>>>
>>> The write dependency doesn't stop a dependent load from executing in
>>> the shadow of a mispredicted branch.
>> I've been given to understand CMOVcc will kill any further speculation
>> using the target register. So by 'poisoning' all argument registers
>> that are involved with loads, we avoid any such load from happening
>> during speculation.

> IANAPA (I am not a pipeline architect), but AIUI,

> CMOVcc establishes a data dependency between flags and the destination register that doesn't exist in the pipeline if you'd used a conditional branch instead.

> It does prevent a dependent load from executing before the CMOVcc has executed.  But it does not stop that load from executing speculatively eventually.

> So, given the following case:

> * SUB is/will results nonzero (ZF=0, %r10=nonzero)
> * JZ predicted taken, despite (ZF=0)

> we call __bhi_args_XXX wherein:

> * CMOVNZ blocks until SUB executes (flags dependency)
> * CMOVNZ eventually executes, and because ZF=0, it really does write
> %r10 over the target registers

> and then we enter the function with all pointers containing the nonzero residual from the hash check.

> Now, because it's a SUBL, the result is < 2^32, a straight deference of one of these pointers will be blocked by SMAP (noone cares about 32bit, or pre-SMAP hardware, right?)

> Forward references from the pointers will be safe (assuming SIB doesn't reach the canonical boundary), but backward references may wrap around back into the kernel space.  These will not be blocked by SMAP and will spill their secrets if suitably provoked.

> ~Andrew