Re: [PATCH] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps

From: Xiaoyao Li
Date: Mon May 22 2023 - 15:31:56 EST


On 5/23/2023 1:43 AM, Sean Christopherson wrote:
6. Performance aside, KVM should not be speculating (ha!) on what the guest
will and will not do, and should instead honor whatever behavior is presented
to the guest. If the guest CPU model indicates that VERW flushes buffers,
then KVM damn well needs to let VERW flush buffers.
The current implementation allows guests to have VERW flush buffers when
they enumerate FB_CLEAR. It only restricts the flush behavior when the
guest is trying to mitigate against a vulnerability(like MDS) on a
hardware that is not affected. I guess its common for guests to be
running with older gen configuration on a newer hardware.
Right, I'm saying that that behavior is wrong. KVM shouldn't assume the guest
the guest will do things a certain way and should instead honor the "architectural"
definition, in quotes because I realize there probably is no architectural
definition for any of this.

It might be that the code does (unintentionally?) honor the "architecture", i.e.
this code might actually be accurrate with respect to when the guest can expect
VERW to flush buffers. But the comment is so, so wrong.

The comment is wrong and the code is wrong in some case as well.

If none of ARCH_CAP_FB_CLEAR, ARCH_CAP_MDS_NO, ARCH_CAP_TAA_NO, ARCH_CAP_PSDP_NO, ARCH_CAP_FBSDP_NO and ARCH_CAP_SBDR_SSDP_NO are exposed to VM, the VM is type of "affected by MDS".

And accroding to the page https://www.intel.com/content/www/us/en/developer/articles/technical/software-security-guidance/technical-documentation/processor-mmio-stale-data-vulnerabilities.html

if the VM enumerates support for both L1D_FLUSH and MD_CLEAR, it implicitly enumerates FB_CLEAR as part of their MD_CLEAR support.

However, the code will leave vmx->disable_fb_clear as 1 if hardware supports it, and VERW intruction doesn't clear FB in the VM, which conflicts "architectural" definition.

/*
* If guest will not execute VERW, there is no need to set FB_CLEAR_DIS
* at VMEntry. Skip the MSR read/write when a guest has no use case to
* execute VERW.
*/
if ((vcpu->arch.arch_capabilities & ARCH_CAP_FB_CLEAR) ||
((vcpu->arch.arch_capabilities & ARCH_CAP_MDS_NO) &&
(vcpu->arch.arch_capabilities & ARCH_CAP_TAA_NO) &&
(vcpu->arch.arch_capabilities & ARCH_CAP_PSDP_NO) &&
(vcpu->arch.arch_capabilities & ARCH_CAP_FBSDP_NO) &&
(vcpu->arch.arch_capabilities & ARCH_CAP_SBDR_SSDP_NO)))
vmx->disable_fb_clear = false;