Re: [PATCH v2 5/6] KVM: x86: Keep a per-VM MTRR state

From: Yan Zhao
Date: Fri May 26 2023 - 04:20:44 EST


On Thu, May 25, 2023 at 07:53:41AM -0700, Sean Christopherson wrote:
> > > > > So, if KVM zaps SPTEs when CR0.CD is cleared (even when the quirk is enabled),
> > > > > then KVM can skip the MTRR zaps when CR0.CD=1 because KVM is ignoring the MTRRs
> > > > > and will zap when CR0.CD is cleared. And to avoid regressing the CR0.CD case,
> > > > > if KVM honors guest PAT for the bizarro CR0.CD quirk, then KVM only needs to
> > > > > zap non-WB MTRR ranges when CR0.CD is cleared. Since WB is weak, looking for
> > Not only non-WB ranges are required to be zapped.
> > Think about a scenario:
> > (1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too.
> > (2) after CR0.CD=1, without zap, its EPT entry memtype is still UC.
> > (3) then guest performs MTRR disable, no zap too.
> > (4) guest updates this fixed MTRR range to WB, and performs MTRR enable.
> > (5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB.
>
> KVM installs WB memtype when the quirk is enabled, thus no need to zap. KVM
> currently zaps everything when the quirk is disabled, and I'm not proposing that
> be changed.
I didn't explain it clearly.

(1) one fixed MTRR range is UC initially. Its EPT entry memtype is UC, too. ==> EPT entry has been created here
(2) after CR0.CD=1, because of the quirk, no zap, the created EPT entry memtype is still UC.
(3) then guest performs MTRR disable, no zap too, according to our change.
(4) guest updates this fixed MTRR range to WB, and performs MTRR enable.
(5) CR0.CD=0. we need to zap this fixed range to update the EPT memtype to WB.==>we also need to zap WB range.


> It doesn't actually require non-coherent DMA within the CR0.CD=1 window though.
If we don't need to mind non-coherent DMA within the window CR0.CD=1 to CR0.CD=0,
do you think it's a good idea to do in this way...

(1) when CR0.CD=1, return guest mtrr type.

--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7532,12 +7532,13 @@ static u8 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
return (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;

if (kvm_read_cr0_bits(vcpu, X86_CR0_CD)) {
- if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
- cache = MTRR_TYPE_WRBACK;
- else
+ if (kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED)) {
+ cache = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
+ return cache << VMX_EPT_MT_EPTE_SHIFT;
+ } else {
cache = MTRR_TYPE_UNCACHABLE;
-
- return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
+ return (cache << VMX_EPT_MT_EPTE_SHIFT) | VMX_EPT_IPAT_BIT;
+ }
}


(2) return MTRR_TYPE_WRBACK if guest mtrr has not been enabled for once
u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
@@ -628,13 +635,23 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
struct mtrr_iter iter;
u64 start, end;
int type = -1;
const int wt_wb_mask = (1 << MTRR_TYPE_WRBACK)
| (1 << MTRR_TYPE_WRTHROUGH);

+ if (!mtrr_state->enabled_once)
+ return MTRR_TYPE_WRBACK;
+


(3) when CR0.CD = 1 and the quirk is on, return MTRR type as if MTRR is enabled
+ ignore_disable = kvm_read_cr0_bits(vcpu, X86_CR0_CD) &&
+ kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED);

-static void mtrr_lookup_start(struct mtrr_iter *iter)
+static void mtrr_lookup_start(struct mtrr_iter *iter, bool ignore_disable)
{
- if (!mtrr_is_enabled(iter->mtrr_state)) {
+ if (!ignore_disable && !mtrr_is_enabled(iter->mtrr_state)) {
iter->mtrr_disabled = true;
return;
}

(4) Then we only need to do EPT zap when MTRR is enabled for the first time
and when MTRR fixed/var entries are changed at enabling MTRR or at CR0.CD=0
(I prefer at enabling MTRR, as seabios may do the MTRR disable/update/enable when
CR0.CD=0)


Besides, accoding to the commit message introducing KVM_QUIRK_CD_NW_CLEARED,
we can return MTRR_TYPE_WRBACK for CR0.CD=1 only when MTRR is not enbled for
once. (And I tried, it works!)

commit fb279950ba02e3210a16b11ecfa8871f3ee0ca49
Author: Xiao Guangrong <guangrong.xiao@xxxxxxxxx>
Date: Thu Jul 16 03:25:56 2015 +0800

KVM: vmx: obey KVM_QUIRK_CD_NW_CLEARED

OVMF depends on WB to boot fast, because it only clears caches after
it has set up MTRRs---which is too late.

Let's do writeback if CR0.CD is set to make it happy, similar to what
SVM is already doing.

> Yes, we can ignore !X86_FEATURE_MTRR guests. If someone wants to run such a
> setup with non-coherent DMA, they're welcome to send patches and write a lengthy
> explanation of why they want to run such insanity :-)
great :)

>
> > > > And pieces of no-WB memory might produce more kvm_zap_gfn_range()?
> > >
> > > Yes, but they'll be much, much more precise. And the bajillion fixed MTRRs can
> > Could we instead keep a per-VM copy of MTRR?
> > Then, if a vCPU modifies an MTRR entry and we find it's different from that
> > in the per-VM copy, we mark that entry dirty. When CD0=0, we just zap
> > those dirty ranges. (or just zap all if there are dirty entries)
> > In theory, only one vCPU will trigger this zap in each round of MTRR update.
>
> I don't think tracking "dirty" MTRRs releative to a per-VM copy will work because
> of the weird behavior of the quirk. E.g. if the below happens, vCPU1's WB SPTE
> won't be zapped because MTRRx wasn't "dirty".
>
By returning MTRR type (as if MTRR is enabled) for CR0.CD=1, we can avoid to zap
entries in this window. Agree?

> vCPU0 vCPU1
> ---------- ----------
> 1. MTRRx = UC
> 2. CR0.CD = 1 CR0.CD=1
> 3. SPTE = WB
> 4. MTRRx = UC
> 5. CR0.CD=0
>
> Another problem is that while only one vCPU will trigger the zap, KVM needs to
> stall all vCPUs that dirtied MTRRs *relative to the original per-VM state*. E.g.
> if the per-VM state for MTRRx is WB and both vCPU1 and vCPU2 dirty an MTRR and
> both clear CR0.CD, then KVM needs to stall vCPU1 and vCPU2 regardless of which
> vCPU actually does the zapping.
>
> And when handling the transition from dirty=>clean, KVM would need to prevent
> writes to MTRRs while grabbing the snapshot of dirty MTRRs, i.e. to-be-zapped
> ranges, in order to provide stable view of the clean per-VM copy.
>
> In other words, the net effect will essentially be what I sketched out with the
> precise zapping, but with the added downside of serializing writes to MTRRs while
> transitioning the per-VM state from dirty=>clean.
Yes, the net effect will essentially be what you sketched out in last mail.

I can try to figure out which way is more efficient before sending out next version
if you agree with the above direction, i.e.
(1) Do not consider non-coherent DMA when CR0.CD=1
(2) return MTRR_TYPE_WRBACK for CR0.CD=1 only when MTRR is not enabled for once.
(3) return MTRR type as if MTRR is enabled for CR0.CD=1