Re: [PATCH] KVM: x86: Expose ARCH_CAP_FB_CLEAR when invulnerable to MDS

From: Sean Christopherson
Date: Wed Apr 02 2025 - 09:37:49 EST


On Mon, Mar 31, 2025, Jon Kohler wrote:
> Expose FB_CLEAR in arch_capabilities for certain MDS-invulnerable cases
> to support live migration from older hardware (e.g., Cascade Lake, Ice
> Lake) to newer hardware (e.g., Sapphire Rapids or higher). This ensures
> compatibility when user space has previously configured vCPUs to see
> FB_CLEAR (ARCH_CAPABILITIES Bit 17).
>
> Newer hardware sets the following bits but does not set FB_CLEAR, which
> can prevent user space from configuring a matching setup:

I looked at this again right after PUCK, and KVM does NOT actually prevent
userspace from matching the original, pre-SPR configuration. KVM effectively
treats ARCH_CAPABILITIES like a CPUID leaf, and lets userspace shove in any
value. I.e. userspace can still migrate+stuff FB_CLEAR irrespective of hardware
support, and thus there is no need for KVM to lie to userspace.

So in effect, this is a userspace problem where it's being too aggressive in its
sanity checks.

FWIW, even if KVM did reject unsupported ARCH_CAPABILITIES bits, I would still
say this is userspace's problem to solve. E.g. by using MSR filtering to
intercept and emulate RDMSR(ARCH_CAPABILITIES) in userspace.

> ARCH_CAP_MDS_NO
> ARCH_CAP_TAA_NO
> ARCH_CAP_PSDP_NO
> ARCH_CAP_FBSDP_NO
> ARCH_CAP_SBDR_SSDP_NO
>
> This change has minimal impact, as these bit combinations already mark
> the host as MMIO immune (via arch_cap_mmio_immune()) and set
> disable_fb_clear in vmx_update_fb_clear_dis(), resulting in no
> additional overhead.
>
> Cc: Emanuele Giuseppe Esposito <eesposit@xxxxxxxxxx>
> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> Cc: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx>
> Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx>
>
> ---
> arch/x86/kvm/x86.c | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index c841817a914a..2a4337aa78cd 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1641,6 +1641,20 @@ static u64 kvm_get_arch_capabilities(void)
> if (!boot_cpu_has_bug(X86_BUG_GDS) || gds_ucode_mitigated())
> data |= ARCH_CAP_GDS_NO;
>
> + /*
> + * User space might set FB_CLEAR when starting a vCPU on a system
> + * that does not enumerate FB_CLEAR but is also invulnerable to
> + * other various MDS related bugs. To allow live migration from
> + * hosts that do implement FB_CLEAR, leave it enabled.
> + */
> + if ((data & ARCH_CAP_MDS_NO) &&
> + (data & ARCH_CAP_TAA_NO) &&
> + (data & ARCH_CAP_PSDP_NO) &&
> + (data & ARCH_CAP_FBSDP_NO) &&
> + (data & ARCH_CAP_SBDR_SSDP_NO)) {
> + data |= ARCH_CAP_FB_CLEAR;
> + }
> +
> return data;
> }
>
> --
> 2.43.0
>