Re: [PATCH] KVM: x86: Track supported ARCH_CAPABILITIES in kvm_caps

From: Xiaoyao Li
Date: Thu May 25 2023 - 11:42:35 EST

On 5/23/2023 11:34 AM, Pawan Gupta wrote:
On Tue, May 23, 2023 at 09:00:50AM +0800, Xiaoyao Li wrote:
On 5/23/2023 5:23 AM, Pawan Gupta wrote:
On Tue, May 23, 2023 at 03:31:44AM +0800, Xiaoyao Li wrote:
On 5/23/2023 1:43 AM, Sean Christopherson wrote:
6. Performance aside, KVM should not be speculating (ha!) on what the guest
will and will not do, and should instead honor whatever behavior is presented
to the guest. If the guest CPU model indicates that VERW flushes buffers,
then KVM damn well needs to let VERW flush buffers.
The current implementation allows guests to have VERW flush buffers when
they enumerate FB_CLEAR. It only restricts the flush behavior when the
guest is trying to mitigate against a vulnerability(like MDS) on a
hardware that is not affected. I guess its common for guests to be
running with older gen configuration on a newer hardware.
Right, I'm saying that that behavior is wrong. KVM shouldn't assume the guest
the guest will do things a certain way and should instead honor the "architectural"
definition, in quotes because I realize there probably is no architectural
definition for any of this.

It might be that the code does (unintentionally?) honor the "architecture", i.e.
this code might actually be accurrate with respect to when the guest can expect
VERW to flush buffers. But the comment is so, so wrong.

The comment is wrong and the code is wrong in some case as well.

VM, the VM is type of "affected by MDS".

And accroding to the page

if the VM enumerates support for both L1D_FLUSH and MD_CLEAR, it implicitly
enumerates FB_CLEAR as part of their MD_CLEAR support.

This is the excerpt from the link that you mentioned:

"For processors that are affected by MDS and support L1D_FLUSH
operations and MD_CLEAR operations, the VERW instruction flushes fill

You are missing an important information here "For the processors
_affected_ by MDS". On such processors ...

However, the code will leave vmx->disable_fb_clear as 1 if hardware supports
it, and VERW intruction doesn't clear FB in the VM, which conflicts
"architectural" definition.

... Fill buffer clear is not enabled at all:

u64 msr;
if (boot_cpu_has(X86_FEATURE_ARCH_CAPABILITIES) &&
!boot_cpu_has_bug(X86_BUG_MDS) &&
!boot_cpu_has_bug(X86_BUG_TAA)) {
vmx_fb_clear_ctrl_available = true;

This is the check of bare metal, while the check in
vmx_update_fb_clear_dis() is of guest VM.

For example, if the hardware (host) enumerates ARCH_CAP_TAA_NO,
ARCH_CAP_FB_CLEAR, and ARCH_CAP_FB_CLEAR_CTRL, the VERW on this hardware
clears Fill Buffer (if FB_CLEAR_DIS is not enabled in
MSR_IA32_MCU_OPT_CTRL). vmx_setup_fb_clear_ctrl() does set
vmx_fb_clear_ctrl_available to true.

If a guest is exposed without ARCH_CAP_TAA_NO, ARCH_CAP_MDS_NO,
ARCH_CAP_FB_CLEAR, vmx_update_fb_clear_dis() will leave
vmx->disable_fb_clear as true. So VERW doesn't clear Fill Buffer for guest.
But in the view of guset, it expects VERW to clear Fill Buffer.

That is correct, but whether VERW clears the CPU buffers also depends on
if the hardware is affected or not, enumerating MD_CLEAR solely does not
guarantee that VERW will flush CPU buffers. This was true even before
MMIO Stale Data was discovered.

If host(hardware) enumerates:

MD_CLEAR | MDS_NO | VERW behavior
1 | 0 | Clears CPU buffers

But on an MDS mitigated hardware(MDS_NO=1) if guest enumerates:

MD_CLEAR | MDS_NO | VERW behavior
1 | 0 | Not guaranteed to clear
CPU buffers

After MMIO Stale Data, FB_CLEAR_DIS was introduced to keep this behavior
intact(for hardware that is not affected by MDS/TAA).

Sorry, I don't understand it. What the behavior is?

If the userspace
truly wants the guest to have VERW flush behavior, it can export
I see your point that from a guest's perspective it is being lied about
VERW behavior. OTOH, I am not sure if it is a good enough reason for
mitigated hardware to keep the overhead of clearing micro-architectural
buffers for generations of CPUs.

User takes the responsiblity because itself requests the specific feature combination for its guest.