Re: [PATCH v4 04/11] x86/bhi: Make clear_bhb_loop() effective on newer CPUs

From: Jim Mattson

Date: Fri Mar 06 2026 - 17:59:13 EST


On Fri, Mar 6, 2026 at 2:32 PM Pawan Gupta
<pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
>
> On Fri, Mar 06, 2026 at 01:00:15PM -0800, Jim Mattson wrote:
> > On Wed, Nov 19, 2025 at 10:19 PM Pawan Gupta
> > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > >
> > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > sequence is not sufficient because it doesn't clear enough entries. This
> > > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > > that mitigates BHI in kernel.
> > >
> > > BHI variant of VMSCAPE requires isolating branch history between guests and
> > > userspace. Note that there is no equivalent hardware control for userspace.
> > > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > > should execute sufficient number of branches to clear a larger BHB.
> > >
> > > Dynamically set the loop count of clear_bhb_loop() such that it is
> > > effective on newer CPUs too. Use the hardware control enumeration
> > > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> >
> > I didn't speak up earlier, because I have always considered the change
> > in MAXPHYADDR from ICX to SPR a hard barrier for virtual machines
> > masquerading as a different platform. Sadly, I am now losing that
> > battle. :(
> >
> > If a heterogeneous migration pool includes hosts with and without
> > BHI_CTRL, then BHI_CTRL cannot be advertised to a guest, because it is
> > not possible to emulate BHI_DIS_S on a host that doesn't have it.
> > Hence, one cannot derive the size of the BHB from the existence of
> > this feature bit.
>
> As far as VMSCAPE mitigation is concerned, mitigation is done by the host
> so enumeration of BHI_CTRL is not a problem. The issue that you are
> refering to exists with or without this patch.

The hypervisor *should* set IA32_SPEC_CTRL.BHI_DIS_S on the guest's
behalf when BHI_CTRL is not advertised to the guest. However, this
doesn't actually happen today. KVM does not support the tertiary
processor-based VM-execution controls bit 7 (virtualize
IA32_SPEC_CTRL), and KVM cedes the IA32_SPEC_CTRL MSR to the guest on
the first non-zero write.

> I suppose your point is in the context of Native BHI mitigation for the
> guests.

Specific vulnerabilities aside, my point is that one cannot infer
anything about the underlying hardware from the presence or absence of
BHI_CTRL in a VM.

> > I think we need an explicit CPUID bit that a hypervisor can set to
> > indicate that the underlying hardware might be SPR or later.
>
> Something similar was attempted via virtual-MSRs in the below series:
>
> [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
> https://lore.kernel.org/lkml/20240410143446.797262-10-chao.gao@xxxxxxxxx/
>
> Do you think a rework of this approach would help?

No, I think that whole idea is ill-conceived. As I said above, the
hypervisor should just set IA32_SPEC_CTRL.BHI_DIS_S on the guest's
behalf when BHI_CTRL is not advertised to the guest. I don't see any
value in predicating this mitigation on guest usage of the short BHB
clearing sequence. Just do it.