Re: [PATCH v8 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs

From: David Laight

Date: Thu Mar 26 2026 - 05:24:07 EST


On Thu, 26 Mar 2026 01:39:34 -0700
Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:

> On Wed, Mar 25, 2026 at 09:37:59PM +0100, Borislav Petkov wrote:
> > On Tue, Mar 24, 2026 at 03:13:08PM -0700, Pawan Gupta wrote:
> > > This is cleaner. A few things to consider are, CLEAR_BRANCH_HISTORY that
> > > calls clear_bhb_loop() would be calling into C code very early during the
> > > kernel entry. The code generated here may vary based on the compiler. Any
> > > indirect branch here would be security risk. This needs to be noinstr so
> > > that it can't be hijacked by probes and ftraces.
> > >
> > > At kernel entry, calling into C before mitigations are applied is risky.
> >
> > You can write the above function in asm if you prefer - should still be
> > easier.
>
> I believe the equivalent for cpu_feature_enabled() in asm is the
> ALTERNATIVE. Please let me know if I am missing something.
>
> Regarding your intent to move the loop count selection out of the BHB
> sequence, below is what I could come up. It is not as pretty as the C
> version, but it is trying to achieve something similar:

I think that fails on being harder to read and longer.
So no real benefit.

I believe this code has to be asm because it is required to excute
specific instructions in a specific order - you can't trust the C
compiler to do that for you.

David

>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index ecae3cef9d8c..54c65b0a3f65 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1494,6 +1494,20 @@ SYM_CODE_START_NOALIGN(rewind_stack_and_make_dead)
> SYM_CODE_END(rewind_stack_and_make_dead)
> .popsection
>
> +/*
> + * Between the long and short version of BHB clear sequence, just the
> + * loop count differs based on BHI_CTRL, see Intel's BHI guidance.
> + */
> +#define BHB_SHORT_LOOP_OUTER 5
> +#define BHB_SHORT_LOOP_INNER 5
> +
> +#define BHB_LONG_LOOP_OUTER 12
> +#define BHB_LONG_LOOP_INNER 7
> +
> +#define BHB_MOVB(type, reg) \
> + ALTERNATIVE __stringify(movb $BHB_SHORT_LOOP_##type, reg), \
> + __stringify(movb $BHB_LONG_LOOP_##type, reg), X86_FEATURE_BHI_CTRL
> +
> /*
> * This sequence executes branches in order to remove user branch information
> * from the branch history tracker in the Branch Predictor, therefore removing
> @@ -1540,12 +1554,7 @@ SYM_FUNC_START(clear_bhb_loop_nofence)
> /* BPF caller may require all registers to be preserved */
> push %rax
>
> - /*
> - * Between the long and short version of BHB clear sequence, just the
> - * loop count differs based on BHI_CTRL, see Intel's BHI guidance.
> - */
> - ALTERNATIVE "movb $5, %al", \
> - "movb $12, %al", X86_FEATURE_BHI_CTRL
> + BHB_MOVB(OUTER, %al)
>
> ANNOTATE_INTRA_FUNCTION_CALL
> call 1f
> @@ -1567,8 +1576,7 @@ SYM_FUNC_START(clear_bhb_loop_nofence)
> * but some Clang versions (e.g. 18) don't like this.
> */
> .skip 32 - 14, 0xcc
> -2: ALTERNATIVE "movb $5, %ah", \
> - "movb $7, %ah", X86_FEATURE_BHI_CTRL
> +2: BHB_MOVB(INNER, %ah)
> 3: jmp 4f
> nop
> 4: sub $1, %ah
>
>
> Below is how the disassembly looks like:
>
> clear_bhb_loop_nofence:
> ...
> call 1f
> jmp 5f
> // BHB_MOVB(OUTER, %al)
> mov $0x5,%al