Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs
From: Pawan Gupta
Date: Fri Apr 03 2026 - 19:16:34 EST
On Fri, Apr 03, 2026 at 02:59:33PM -0700, Jim Mattson wrote:
> On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
> <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> > > On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> > > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > > > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > > > in the kernel.
> > > > > >
> > > > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > > > kernel, the newer CPUs also use IBPB.
> > > > > >
> > > > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > > > >
> > > > > > Suggested-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
> > > > > > ---
> > > > > > arch/x86/entry/entry_64.S | 8 +++++---
> > > > > > arch/x86/include/asm/nospec-branch.h | 2 ++
> > > > > > arch/x86/kernel/cpu/bugs.c | 13 +++++++++++++
> > > > > > 3 files changed, 20 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > > > --- a/arch/x86/entry/entry_64.S
> > > > > > +++ b/arch/x86/entry/entry_64.S
> > > > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > ANNOTATE_NOENDBR
> > > > > > push %rbp
> > > > > > mov %rsp, %rbp
> > > > > > - movl $5, %ecx
> > > > > > +
> > > > > > + movzbl bhb_seq_outer_loop(%rip), %ecx
> > > > > > +
> > > > > > ANNOTATE_INTRA_FUNCTION_CALL
> > > > > > call 1f
> > > > > > jmp 5f
> > > > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > > > > * but some Clang versions (e.g. 18) don't like this.
> > > > > > */
> > > > > > - .skip 32 - 18, 0xcc
> > > > > > -2: movl $5, %eax
> > > > > > + .skip 32 - 20, 0xcc
> > > > > > +2: movzbl bhb_seq_inner_loop(%rip), %eax
> > > > > > 3: jmp 4f
> > > > > > nop
> > > > > > 4: sub $1, %eax
> > > > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > > > > extern void update_spec_ctrl_cond(u64 val);
> > > > > > extern u64 spec_ctrl_current(void);
> > > > > >
> > > > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > > > +
> > > > > > /*
> > > > > > * With retpoline, we must use IBRS to restrict branch prediction
> > > > > > * before calling into firmware.
> > > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > > > > static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > > > > IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > > > >
> > > > > > +/* Default to short BHB sequence values */
> > > > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > > > +
> > > > > > static int __init spectre_bhi_parse_cmdline(char *str)
> > > > > > {
> > > > > > if (!str)
> > > > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > > > > x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > > > > }
> > > > > >
> > > > > > + /*
> > > > > > + * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > > > + * support), see Intel's BHI guidance.
> > > > > > + */
> > > > > > + if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > > > + bhb_seq_outer_loop = 12;
> > > > > > + bhb_seq_inner_loop = 7;
> > > > > > + }
> > > > > > +
> > > > >
> > > > > How does this work for VMs in a heterogeneous migration pool that
> > > > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > > > it isn't available on all hosts in the migration pool, but they need
> > > > > the long sequence when running on Alder Lake or newer.
> > > >
> > > > As we discussed elsewhere, support for migration pool is much more
> > > > involved. It should be dealt in a separate QEMU/KVM focused series.
> > > >
> > > > A quickfix could be adding support for spectre_bhi=long that guests in a
> > > > migration pool can use?
> > >
> > > The simplest solution is to add "|
> > > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> > > If that is unacceptable for the performance of pre-Alder Lake
> >
> > Yes, that would be unnecessary overhead.
> >
> > > migration pools, you could define a CPUID or MSR bit that says
> > > explicitly, "long BHB flush sequence needed," rather than trying to
> > > intuit that property from the presence of BHI_CTRL. Like
> > > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> > > by a hypervisor.
> >
> > I will think about this more.
> >
> > > I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> > > friends, unless there is a major guest OS out there that relies on
> > > them.
> >
> > If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
> > in the best position to decide whether a guest needs
> > virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
> > BHI_DIS_S for the guests that are in migration pool?
>
> That is not possible today, since KVM does not implement Intel's
> IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
> to the guest after the first non-zero write to the guest's MSR.
Yes, KVM doesn't support it yet. But, adding that support to give more
control to userspace VMM helps this case, and probably many other in
the future.
I will check with Chao if he can prepare the next version of virtual
SPEC_CTRL series (leaving out virtual mitigation MSRs).