Re: [PATCH v2 5/6] KVM: x86: Keep a per-VM MTRR state

From: Yan Zhao
Date: Mon May 29 2023 - 21:44:44 EST


On Fri, May 26, 2023 at 09:09:08AM -0700, Sean Christopherson wrote:
> On Fri, May 26, 2023, Yan Zhao wrote:
> > On Thu, May 25, 2023 at 07:53:41AM -0700, Sean Christopherson wrote:
> > > > > > > So, if KVM zaps SPTEs when CR0.CD is cleared (even when the quirk is enabled),
> > > > > > > then KVM can skip the MTRR zaps when CR0.CD=1 because KVM is ignoring the MTRRs
> > > > > > > and will zap when CR0.CD is cleared. And to avoid regressing the CR0.CD case,
> > > > > > > if KVM honors guest PAT for the bizarro CR0.CD quirk, then KVM only needs to
> > > > > > > zap non-WB MTRR ranges when CR0.CD is cleared. Since WB is weak, looking for
> > > > Not only non-WB ranges are required to be zapped.
> > > > Think about a scenario:
> > > > (1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too.
> > > > (2) after CR0.CD=1, without zap, its EPT entry memtype is still UC.
> > > > (3) then guest performs MTRR disable, no zap too.
> > > > (4) guest updates this fixed MTRR range to WB, and performs MTRR enable.
> > > > (5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB.
> > >
> > > KVM installs WB memtype when the quirk is enabled, thus no need to zap. KVM
> > > currently zaps everything when the quirk is disabled, and I'm not proposing that
> > > be changed.
> > I didn't explain it clearly.
> >
> > (1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too. ==> EPT entry has been created here
> > (2) after CR0.CD=1, because of the quirk, no zap, the created EPT entry memtype is still UC.
> > (3) then guest performs MTRR disable, no zap too, according to our change.
> > (4) guest updates this fixed MTRR range to WB, and performs MTRR enable.
> > (5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB.==>we also need to zap WB range.
>
> Ugh, right. But that case can be handled by zapping non-WB ranges on CR0.CD being
> set. Hmm, and KVM would need to zap everything if CR0.CD is toggled with MTRRs
Ok. Actually even with below "zap everything always", I didn't observe any performance
downgrade on OVMF in my side regarding to this change as we skipped EPT zap in
update_mtrr() when CR0.CD=1.
(even 3 less zaps for vCPU 0 and 1 less zap for APs as initially there are several MTRRs
disabled when CR0.CD=1 without accompanying CR0.CD toggle).

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dadf80fb85e9..ad3c4dc9d7bf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -941,9 +941,9 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon

if (((cr0 ^ old_cr0) & X86_CR0_CD) &&
- kvm_mmu_honors_guest_mtrr(vcpu->kvm) &&
- !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)
+ kvm_mmu_honors_guest_mtrr(vcpu->kvm)
kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
}
EXPORT_SYMBOL_GPL(kvm_post_set_cr0);



> disabled, but I don't think OVMF ever does that. Zapping on CR0.CD toggling would
> would likely introduce a small performance hit for SeaBIOS due to SeaBIOS clearing
> CR0.CD immediately after SIPI, i.e. with MTRRs disabled, but that's arguably a
For SeaBIOS, I also observed 1 less of EPT zap for each vCPU, because
there are 3 more MTRR toggles than CR0.CD toggles for each vCPU, and only 2 MTRR
toggles happen when CR0.CD=0.

> correctness fix since the quirk means KVM incorrectly runs the vCPU with WB SPTEs
> until MTRRs are programmed.

> With precise+batched zapping, zapping non-WB ranges even when CR0.CD is set should
> still be a healthy performance boost for OVMF.
Ok. I'll try to implement in this way in the next version.


> > (2) return MTRR_TYPE_WRBACK if guest mtrr has not been enabled for once
> > u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
> > @@ -628,13 +635,23 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
> > struct mtrr_iter iter;
> > u64 start, end;
> > int type = -1;
> > const int wt_wb_mask = (1 << MTRR_TYPE_WRBACK)
> > | (1 << MTRR_TYPE_WRTHROUGH);
> >
> > + if (!mtrr_state->enabled_once)
> > + return MTRR_TYPE_WRBACK;
>
> No, because this assumes too many things about the guest, and completely falls
> apart if the guest reboots.
Ah, right.

> I am not willing to lean on the historical commit messages for the quirk to
> justify any change. I'm not at all convinced that there was any meaningful thought
> put into ensuring correctness.
>
> > we can return MTRR_TYPE_WRBACK for CR0.CD=1 only when MTRR is not enbled for
> > once. (And I tried, it works!)
>
> On your system, for your setup. The quirk terrifies me because it likely affects
> every KVM-based VM out there (I can't find a single VMM that disables the quirk).
> These changes are limited to VMs with noncoherent DMA, but still.
>
> In short, I am steadfastly against any change that piles more arbitrary behavior
> functional behavior on top of the quirk, especially when that behavior relies on
> heuristics to try and guess what the guest is doing.

Ok, yes, this is the right insistence.