Re: [PATCH v4 04/11] x86/bhi: Make clear_bhb_loop() effective on newer CPUs

From: Jim Mattson

Date: Fri Mar 06 2026 - 19:42:22 EST


On Fri, Mar 6, 2026 at 3:29 PM Pawan Gupta
<pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
>
> On Fri, Mar 06, 2026 at 02:57:13PM -0800, Jim Mattson wrote:
> > On Fri, Mar 6, 2026 at 2:32 PM Pawan Gupta
> > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, Mar 06, 2026 at 01:00:15PM -0800, Jim Mattson wrote:
> > > > On Wed, Nov 19, 2025 at 10:19 PM Pawan Gupta
> > > > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > > > > that mitigates BHI in kernel.
> > > > >
> > > > > BHI variant of VMSCAPE requires isolating branch history between guests and
> > > > > userspace. Note that there is no equivalent hardware control for userspace.
> > > > > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > > > > should execute sufficient number of branches to clear a larger BHB.
> > > > >
> > > > > Dynamically set the loop count of clear_bhb_loop() such that it is
> > > > > effective on newer CPUs too. Use the hardware control enumeration
> > > > > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > > >
> > > > I didn't speak up earlier, because I have always considered the change
> > > > in MAXPHYADDR from ICX to SPR a hard barrier for virtual machines
> > > > masquerading as a different platform. Sadly, I am now losing that
> > > > battle. :(
> > > >
> > > > If a heterogeneous migration pool includes hosts with and without
> > > > BHI_CTRL, then BHI_CTRL cannot be advertised to a guest, because it is
> > > > not possible to emulate BHI_DIS_S on a host that doesn't have it.
> > > > Hence, one cannot derive the size of the BHB from the existence of
> > > > this feature bit.
> > >
> > > As far as VMSCAPE mitigation is concerned, mitigation is done by the host
> > > so enumeration of BHI_CTRL is not a problem. The issue that you are
> > > refering to exists with or without this patch.
> >
> > The hypervisor *should* set IA32_SPEC_CTRL.BHI_DIS_S on the guest's
> > behalf when BHI_CTRL is not advertised to the guest. However, this
> > doesn't actually happen today. KVM does not support the tertiary
> > processor-based VM-execution controls bit 7 (virtualize
> > IA32_SPEC_CTRL), and KVM cedes the IA32_SPEC_CTRL MSR to the guest on
> > the first non-zero write.
>
> The first half of the series adds the support for virtualizing
> IA32_SPEC_CTRL. Atleast that part is worth reconsidering.
>
> https://lore.kernel.org/lkml/20240410143446.797262-1-chao.gao@xxxxxxxxx/

Yes, the support for virtualizing SPEC_CTRL should be submitted separately.

> > > I suppose your point is in the context of Native BHI mitigation for the
> > > guests.
> >
> > Specific vulnerabilities aside, my point is that one cannot infer
> > anything about the underlying hardware from the presence or absence of
> > BHI_CTRL in a VM.
>
> Agree.
>
> > > > I think we need an explicit CPUID bit that a hypervisor can set to
> > > > indicate that the underlying hardware might be SPR or later.
> > >
> > > Something similar was attempted via virtual-MSRs in the below series:
> > >
> > > [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
> > > https://lore.kernel.org/lkml/20240410143446.797262-10-chao.gao@xxxxxxxxx/
> > >
> > > Do you think a rework of this approach would help?
> >
> > No, I think that whole idea is ill-conceived. As I said above, the
> > hypervisor should just set IA32_SPEC_CTRL.BHI_DIS_S on the guest's
> > behalf when BHI_CTRL is not advertised to the guest. I don't see any
> > value in predicating this mitigation on guest usage of the short BHB
> > clearing sequence. Just do it.
>
> There are cases where this would be detrimental:
>
> 1. A guest disabling the mitigation in favor of performance.
> 2. A guest deploying the long SW sequence would suffer from two mitigations
> for the same vulnerability.

The guest is already getting a performance boost from the newer
microarchitecture, so I think this argument is moot.