Re: [PATCH v4 04/11] x86/bhi: Make clear_bhb_loop() effective on newer CPUs

From: Pawan Gupta

Date: Fri Mar 06 2026 - 18:31:23 EST


On Fri, Mar 06, 2026 at 02:57:13PM -0800, Jim Mattson wrote:
> On Fri, Mar 6, 2026 at 2:32 PM Pawan Gupta
> <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Mar 06, 2026 at 01:00:15PM -0800, Jim Mattson wrote:
> > > On Wed, Nov 19, 2025 at 10:19 PM Pawan Gupta
> > > <pawan.kumar.gupta@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites
> > > > the Branch History Buffer (BHB). On Alder Lake and newer parts this
> > > > sequence is not sufficient because it doesn't clear enough entries. This
> > > > was not an issue because these CPUs have a hardware control (BHI_DIS_S)
> > > > that mitigates BHI in kernel.
> > > >
> > > > BHI variant of VMSCAPE requires isolating branch history between guests and
> > > > userspace. Note that there is no equivalent hardware control for userspace.
> > > > To effectively isolate branch history on newer CPUs, clear_bhb_loop()
> > > > should execute sufficient number of branches to clear a larger BHB.
> > > >
> > > > Dynamically set the loop count of clear_bhb_loop() such that it is
> > > > effective on newer CPUs too. Use the hardware control enumeration
> > > > X86_FEATURE_BHI_CTRL to select the appropriate loop count.
> > >
> > > I didn't speak up earlier, because I have always considered the change
> > > in MAXPHYADDR from ICX to SPR a hard barrier for virtual machines
> > > masquerading as a different platform. Sadly, I am now losing that
> > > battle. :(
> > >
> > > If a heterogeneous migration pool includes hosts with and without
> > > BHI_CTRL, then BHI_CTRL cannot be advertised to a guest, because it is
> > > not possible to emulate BHI_DIS_S on a host that doesn't have it.
> > > Hence, one cannot derive the size of the BHB from the existence of
> > > this feature bit.
> >
> > As far as VMSCAPE mitigation is concerned, mitigation is done by the host
> > so enumeration of BHI_CTRL is not a problem. The issue that you are
> > refering to exists with or without this patch.
>
> The hypervisor *should* set IA32_SPEC_CTRL.BHI_DIS_S on the guest's
> behalf when BHI_CTRL is not advertised to the guest. However, this
> doesn't actually happen today. KVM does not support the tertiary
> processor-based VM-execution controls bit 7 (virtualize
> IA32_SPEC_CTRL), and KVM cedes the IA32_SPEC_CTRL MSR to the guest on
> the first non-zero write.

The first half of the series adds the support for virtualizing
IA32_SPEC_CTRL. Atleast that part is worth reconsidering.

https://lore.kernel.org/lkml/20240410143446.797262-1-chao.gao@xxxxxxxxx/

> > I suppose your point is in the context of Native BHI mitigation for the
> > guests.
>
> Specific vulnerabilities aside, my point is that one cannot infer
> anything about the underlying hardware from the presence or absence of
> BHI_CTRL in a VM.

Agree.

> > > I think we need an explicit CPUID bit that a hypervisor can set to
> > > indicate that the underlying hardware might be SPR or later.
> >
> > Something similar was attempted via virtual-MSRs in the below series:
> >
> > [RFC PATCH v3 09/10] KVM: VMX: Advertise MITI_CTRL_BHB_CLEAR_SEQ_S_SUPPORT
> > https://lore.kernel.org/lkml/20240410143446.797262-10-chao.gao@xxxxxxxxx/
> >
> > Do you think a rework of this approach would help?
>
> No, I think that whole idea is ill-conceived. As I said above, the
> hypervisor should just set IA32_SPEC_CTRL.BHI_DIS_S on the guest's
> behalf when BHI_CTRL is not advertised to the guest. I don't see any
> value in predicating this mitigation on guest usage of the short BHB
> clearing sequence. Just do it.

There are cases where this would be detrimental:

1. A guest disabling the mitigation in favor of performance.
2. A guest deploying the long SW sequence would suffer from two mitigations
for the same vulnerability.