Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Jim Mattson
Date: Fri Apr 03 2026 - 14:10:56 EST
On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
<pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
>
> As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> the Branch History Buffer (BHB). On Alder Lake and newer parts this
> sequence is not sufficient because it doesn't clear enough entries. This
> was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> in the kernel.
>
> Now with VMSCAPE (BHI variant) it is also required to isolate branch
> history between guests and userspace. Since BHI_DIS_S only protects the
> kernel, the newer CPUs also use IBPB.
>
> A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> But it currently does not clear enough BHB entries to be effective on newer
> CPUs with larger BHB. At boot, dynamically set the loop count of
> clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
>
> Suggested-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Signed-off-by: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
> ---
> arch/x86/entry/entry_64.S | 8 +++++---
> arch/x86/include/asm/nospec-branch.h | 2 ++
> arch/x86/kernel/cpu/bugs.c | 13 +++++++++++++
> 3 files changed, 20 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> index 3a180a36ca0e..bbd4b1c7ec04 100644
> --- a/arch/x86/entry/entry_64.S
> +++ b/arch/x86/entry/entry_64.S
> @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> ANNOTATE_NOENDBR
> push %rbp
> mov %rsp, %rbp
> - movl $5, %ecx
> +
> + movzbl bhb_seq_outer_loop(%rip), %ecx
> +
> ANNOTATE_INTRA_FUNCTION_CALL
> call 1f
> jmp 5f
> @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> * but some Clang versions (e.g. 18) don't like this.
> */
> - .skip 32 - 18, 0xcc
> -2: movl $5, %eax
> + .skip 32 - 20, 0xcc
> +2: movzbl bhb_seq_inner_loop(%rip), %eax
> 3: jmp 4f
> nop
> 4: sub $1, %eax
> diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> index 70b377fcbc1c..87b83ae7c97f 100644
> --- a/arch/x86/include/asm/nospec-branch.h
> +++ b/arch/x86/include/asm/nospec-branch.h
> @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> extern void update_spec_ctrl_cond(u64 val);
> extern u64 spec_ctrl_current(void);
>
> +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> +
> /*
> * With retpoline, we must use IBRS to restrict branch prediction
> * before calling into firmware.
> diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> index 83f51cab0b1e..2cb4a96247d8 100644
> --- a/arch/x86/kernel/cpu/bugs.c
> +++ b/arch/x86/kernel/cpu/bugs.c
> @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> static enum bhi_mitigations bhi_mitigation __ro_after_init =
> IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
>
> +/* Default to short BHB sequence values */
> +u8 bhb_seq_outer_loop __ro_after_init = 5;
> +u8 bhb_seq_inner_loop __ro_after_init = 5;
> +
> static int __init spectre_bhi_parse_cmdline(char *str)
> {
> if (!str)
> @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> }
>
> + /*
> + * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> + * support), see Intel's BHI guidance.
> + */
> + if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> + bhb_seq_outer_loop = 12;
> + bhb_seq_inner_loop = 7;
> + }
> +
How does this work for VMs in a heterogeneous migration pool that
spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
it isn't available on all hosts in the migration pool, but they need
the long sequence when running on Alder Lake or newer.
Previously, I considered such a migration pool infeasible, because of
the change in MAXPHYADDR, but I now predict that I will lose that
battle.
> x86_arch_cap_msr = x86_read_arch_cap_msr();
>
> cpu_print_attack_vectors();
>
> --
> 2.34.1
>
>
>