Re: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence

From: Andrew Cooper
Date: Wed Feb 19 2025 - 12:15:35 EST

Next message: Maciej W. Rozycki: "Re: [PATCH v6 2/6] syscall.h: add syscall_set_arguments()"
Previous message: Dave Jiang: "Re: [PATCH v2 2/7] cxl/core: cxl_mem_sanitize() cleanup"
In reply to: Peter Zijlstra: "[PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence"
Next in thread: Constable, Scott D: "RE: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 19/02/2025 4:21 pm, Peter Zijlstra wrote:
> Scott notes that non-taken branches are faster. Abuse overlapping code
> that traps instead of explicit UD2 instructions.
>
> And LEA does not modify flags and will have less dependencies.
>
> Suggested-by: Scott Constable <scott.d.constable@xxxxxxxxx>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>

Can we get a bit more info on this "non-taken branches are faster" ?

For modern cores which have branch prediction pre-decode, a branch
unknown to the predictor will behave as non-taken until the Jcc executes[1].

Something size of Linux is surely going to exceed the branch predictor
capacity, so it's perhaps fair to say that there's a reasonable chance
to miss in the predictor.

But, for a branch known to the predictor, taken branches ought to be
bubble-less these days. At least, this is what the marketing material
claims.

And, this doesn't account for branches which alias in the predictor and
end up with a wrong prediction.

~Andrew

[1] Yes, I know RWC has the reintroduced 0xee prefix with the decode
resteer.

Next message: Maciej W. Rozycki: "Re: [PATCH v6 2/6] syscall.h: add syscall_set_arguments()"
Previous message: Dave Jiang: "Re: [PATCH v2 2/7] cxl/core: cxl_mem_sanitize() cleanup"
In reply to: Peter Zijlstra: "[PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence"
Next in thread: Constable, Scott D: "RE: [PATCH v3 05/10] x86/ibt: Optimize FineIBT sequence"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]