Re: [PATCH v9 02/10] x86/bhi: Make clear_bhb_loop() effective on newer CPUs

From: Pawan Gupta

Date: Fri Apr 03 2026 - 19:37:59 EST


On Fri, Apr 03, 2026 at 04:22:28PM -0700, Jim Mattson wrote:
> On Fri, Apr 3, 2026 at 4:16 PM Pawan Gupta
> <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Apr 03, 2026 at 02:59:33PM -0700, Jim Mattson wrote:
> > > On Fri, Apr 3, 2026 at 2:34 PM Pawan Gupta
> > > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Apr 03, 2026 at 01:19:17PM -0700, Jim Mattson wrote:
> > > > > On Fri, Apr 3, 2026 at 11:52 AM Pawan Gupta
> > > > > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, Apr 03, 2026 at 11:10:08AM -0700, Jim Mattson wrote:
> > > > > > > On Thu, Apr 2, 2026 at 5:32 PM Pawan Gupta
> > > > > > > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > > > > > > >
> > > > > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrite
> > > > > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > > > > was not an issue because these CPUs use the BHI_DIS_S hardware mitigation
> > > > > > > > in the kernel.
> > > > > > > >
> > > > > > > > Now with VMSCAPE (BHI variant) it is also required to isolate branch
> > > > > > > > history between guests and userspace. Since BHI_DIS_S only protects the
> > > > > > > > kernel, the newer CPUs also use IBPB.
> > > > > > > >
> > > > > > > > A cheaper alternative to the current IBPB mitigation is clear_bhb_loop().
> > > > > > > > But it currently does not clear enough BHB entries to be effective on newer
> > > > > > > > CPUs with larger BHB. At boot, dynamically set the loop count of
> > > > > > > > clear_bhb_loop() such that it is effective on newer CPUs too. Use the
> > > > > > > > X86_FEATURE_BHI_CTRL feature flag to select the appropriate loop count.
> > > > > > > >
> > > > > > > > Suggested-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> > > > > > > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
> > > > > > > > ---
> > > > > > > > arch/x86/entry/entry_64.S | 8 +++++---
> > > > > > > > arch/x86/include/asm/nospec-branch.h | 2 ++
> > > > > > > > arch/x86/kernel/cpu/bugs.c | 13 +++++++++++++
> > > > > > > > 3 files changed, 20 insertions(+), 3 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
> > > > > > > > index 3a180a36ca0e..bbd4b1c7ec04 100644
> > > > > > > > --- a/arch/x86/entry/entry_64.S
> > > > > > > > +++ b/arch/x86/entry/entry_64.S
> > > > > > > > @@ -1536,7 +1536,9 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > > > ANNOTATE_NOENDBR
> > > > > > > > push %rbp
> > > > > > > > mov %rsp, %rbp
> > > > > > > > - movl $5, %ecx
> > > > > > > > +
> > > > > > > > + movzbl bhb_seq_outer_loop(%rip), %ecx
> > > > > > > > +
> > > > > > > > ANNOTATE_INTRA_FUNCTION_CALL
> > > > > > > > call 1f
> > > > > > > > jmp 5f
> > > > > > > > @@ -1556,8 +1558,8 @@ SYM_FUNC_START(clear_bhb_loop)
> > > > > > > > * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc
> > > > > > > > * but some Clang versions (e.g. 18) don't like this.
> > > > > > > > */
> > > > > > > > - .skip 32 - 18, 0xcc
> > > > > > > > -2: movl $5, %eax
> > > > > > > > + .skip 32 - 20, 0xcc
> > > > > > > > +2: movzbl bhb_seq_inner_loop(%rip), %eax
> > > > > > > > 3: jmp 4f
> > > > > > > > nop
> > > > > > > > 4: sub $1, %eax
> > > > > > > > diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
> > > > > > > > index 70b377fcbc1c..87b83ae7c97f 100644
> > > > > > > > --- a/arch/x86/include/asm/nospec-branch.h
> > > > > > > > +++ b/arch/x86/include/asm/nospec-branch.h
> > > > > > > > @@ -548,6 +548,8 @@ DECLARE_PER_CPU(u64, x86_spec_ctrl_current);
> > > > > > > > extern void update_spec_ctrl_cond(u64 val);
> > > > > > > > extern u64 spec_ctrl_current(void);
> > > > > > > >
> > > > > > > > +extern u8 bhb_seq_inner_loop, bhb_seq_outer_loop;
> > > > > > > > +
> > > > > > > > /*
> > > > > > > > * With retpoline, we must use IBRS to restrict branch prediction
> > > > > > > > * before calling into firmware.
> > > > > > > > diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
> > > > > > > > index 83f51cab0b1e..2cb4a96247d8 100644
> > > > > > > > --- a/arch/x86/kernel/cpu/bugs.c
> > > > > > > > +++ b/arch/x86/kernel/cpu/bugs.c
> > > > > > > > @@ -2047,6 +2047,10 @@ enum bhi_mitigations {
> > > > > > > > static enum bhi_mitigations bhi_mitigation __ro_after_init =
> > > > > > > > IS_ENABLED(CONFIG_MITIGATION_SPECTRE_BHI) ? BHI_MITIGATION_AUTO : BHI_MITIGATION_OFF;
> > > > > > > >
> > > > > > > > +/* Default to short BHB sequence values */
> > > > > > > > +u8 bhb_seq_outer_loop __ro_after_init = 5;
> > > > > > > > +u8 bhb_seq_inner_loop __ro_after_init = 5;
> > > > > > > > +
> > > > > > > > static int __init spectre_bhi_parse_cmdline(char *str)
> > > > > > > > {
> > > > > > > > if (!str)
> > > > > > > > @@ -3242,6 +3246,15 @@ void __init cpu_select_mitigations(void)
> > > > > > > > x86_spec_ctrl_base &= ~SPEC_CTRL_MITIGATIONS_MASK;
> > > > > > > > }
> > > > > > > >
> > > > > > > > + /*
> > > > > > > > + * Switch to long BHB clear sequence on newer CPUs (with BHI_CTRL
> > > > > > > > + * support), see Intel's BHI guidance.
> > > > > > > > + */
> > > > > > > > + if (cpu_feature_enabled(X86_FEATURE_BHI_CTRL)) {
> > > > > > > > + bhb_seq_outer_loop = 12;
> > > > > > > > + bhb_seq_inner_loop = 7;
> > > > > > > > + }
> > > > > > > > +
> > > > > > >
> > > > > > > How does this work for VMs in a heterogeneous migration pool that
> > > > > > > spans the Alder Lake boundary? They can't advertise BHI_CTRL, because
> > > > > > > it isn't available on all hosts in the migration pool, but they need
> > > > > > > the long sequence when running on Alder Lake or newer.
> > > > > >
> > > > > > As we discussed elsewhere, support for migration pool is much more
> > > > > > involved. It should be dealt in a separate QEMU/KVM focused series.
> > > > > >
> > > > > > A quickfix could be adding support for spectre_bhi=long that guests in a
> > > > > > migration pool can use?
> > > > >
> > > > > The simplest solution is to add "|
> > > > > cpu_feature_enabled(X86_FEATURE_HYPERVISOR)" to the condition above.
> > > > > If that is unacceptable for the performance of pre-Alder Lake
> > > >
> > > > Yes, that would be unnecessary overhead.
> > > >
> > > > > migration pools, you could define a CPUID or MSR bit that says
> > > > > explicitly, "long BHB flush sequence needed," rather than trying to
> > > > > intuit that property from the presence of BHI_CTRL. Like
> > > > > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY, the bit would only be set
> > > > > by a hypervisor.
> > > >
> > > > I will think about this more.
> > > >
> > > > > I am still skeptical of the need for MSR_VIRTUAL_ENUMERATION and
> > > > > friends, unless there is a major guest OS out there that relies on
> > > > > them.
> > > >
> > > > If we forget about MSR_VIRTUAL_ENUMERATION for a moment, userspace VMM is
> > > > in the best position to decide whether a guest needs
> > > > virtual.SPEC_CTRL[BHI_DIS_S]. Via a KVM interface userspace VMM can get
> > > > BHI_DIS_S for the guests that are in migration pool?
> > >
> > > That is not possible today, since KVM does not implement Intel's
> > > IA32_SPEC_CTRL virtualization, and cedes the hardware IA32_SPEC_CTRL
> > > to the guest after the first non-zero write to the guest's MSR.
> >
> > Yes, KVM doesn't support it yet. But, adding that support to give more
> > control to userspace VMM helps this case, and probably many other in
> > the future.
>
> But didn't you tell me that Windows doesn't want the hypervisor to set
> BHI_DIS_S behind their back?

Since cloud providers have greater control over userspace, the decision to
use BHI_DIS_S or not can be left to them. KVM would simply follow what it
is asked to do by the userspace.

> > I will check with Chao if he can prepare the next version of virtual
> > SPEC_CTRL series (leaving out virtual mitigation MSRs).
>
> Excellent.