Re: [PATCH v3 6/6] KVM: VMX: Move VERW closer to VMentry for MDS mitigation

From: Pawan Gupta
Date: Thu Oct 26 2023 - 16:48:18 EST


On Thu, Oct 26, 2023 at 12:30:55PM -0700, Sean Christopherson wrote:
> > - /* L1D Flush includes CPU buffer clear to mitigate MDS */
> > + /*
> > + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
> > + * mitigation for MDS is done late in VMentry and is still
> > + * executed inspite of L1D Flush. This is because an extra VERW
>
> in spite

Ok.

> > + * should not matter much after the big hammer L1D Flush.
> > + */
> > if (static_branch_unlikely(&vmx_l1d_should_flush))
> > vmx_l1d_flush(vcpu);
>
> There's an existing bug here. vmx_1ld_flush() is not guaranteed to do a flush in
> "conditional mode", and is not guaranteed to do a ucode-based flush

AFAICT, it is based on the condition whether after a VMexit any
sensitive data could have been touched or not. If L1TF mitigation
doesn't consider certain data sensitive and skips L1D flush, executing
VERW isn't giving any protection, since that data can anyways be leaked
from L1D using L1TF.

> (though I can't tell if it's possible for the VERW magic to exist
> without X86_FEATURE_FLUSH_L1D).

Likely not, ucode that adds VERW should have X86_FEATURE_FLUSH_L1D as
L1TF was mitigation prior to MDS.

> If we care, something like the diff at the bottom is probably needed.
>
> > - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))
> > - mds_clear_cpu_buffers();
> > else if (static_branch_unlikely(&mmio_stale_data_clear) &&
> > kvm_arch_has_assigned_device(vcpu->kvm))
> > + /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */
>
> Please don't put comments inside an if/elif without curly braces (and I don't
> want to add curly braces). Though I think that's a moot point if we first fix
> the conditional L1D flush issue. E.g. when the dust settles we can end up with:

Ok.

> /*
> * Note, a ucode-based L1D flush also flushes CPU buffers, i.e. the
> * manual VERW in __vmx_vcpu_run() to mitigate MDS *may* be redundant.
> * But an L1D Flush is not guaranteed for "conditional mode", and the
> * cost of an extra VERW after a full L1D flush is negligible.
> */
> if (static_branch_unlikely(&vmx_l1d_should_flush))
> cpu_buffers_flushed = vmx_l1d_flush(vcpu);
>
> /*
> * The MMIO stale data vulnerability is a subset of the general MDS
> * vulnerability, i.e. this is mutually exclusive with the VERW that's
> * done just before VM-Enter. The vulnerability requires the attacker,
> * i.e. the guest, to do MMIO, so this "clear" can be done earlier.
> */
> if (static_branch_unlikely(&mmio_stale_data_clear) &&
> !cpu_buffers_flushed && kvm_arch_has_assigned_device(vcpu->kvm))
> mds_clear_cpu_buffers();

This is certainly better, but I don't know what scenario is this helping with.

> > mds_clear_cpu_buffers();
> >
> > vmx_disable_fb_clear(vmx);
>
> LOL, nice. IIUC, setting FB_CLEAR_DIS is mutually exclusive with doing a late
> VERW, as KVM will never set FB_CLEAR_DIS if the CPU is susceptible to X86_BUG_MDS.
> But the checks aren't identical, which makes this _look_ sketchy.
>
> Can you do something like this to ensure we don't accidentally neuter the late VERW?
>
> static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx)
> {
> vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) &&
> !boot_cpu_has_bug(X86_BUG_MDS) &&
> !boot_cpu_has_bug(X86_BUG_TAA);
>
> if (vmx->disable_fb_clear &&
> WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)))
> vmx->disable_fb_clear = false;

Will do, this makes a lot of sense.