Re: [RFC 11/19] KVM: x86/mmu: Factor shadow_zero_check out of make_spte

From: Sean Christopherson
Date: Wed Nov 17 2021 - 22:29:52 EST


On Thu, Nov 18, 2021, Sean Christopherson wrote:
> On Wed, Nov 10, 2021, Ben Gardon wrote:
> > In the interest of devloping a version of make_spte that can function
> > without a vCPU pointer, factor out the shadow_zero_mask to be an
> > additional argument to the function.
> >
> > No functional change intended.
> >
> >
> > Signed-off-by: Ben Gardon <bgardon@xxxxxxxxxx>
> > ---
> > arch/x86/kvm/mmu/spte.c | 11 +++++++----
> > arch/x86/kvm/mmu/spte.h | 3 ++-
> > 2 files changed, 9 insertions(+), 5 deletions(-)
> >
> > diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
> > index b7271daa06c5..d3b059e96c6e 100644
> > --- a/arch/x86/kvm/mmu/spte.c
> > +++ b/arch/x86/kvm/mmu/spte.c
> > @@ -93,7 +93,8 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
> > struct kvm_memory_slot *slot, unsigned int pte_access,
> > gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool prefetch,
> > bool can_unsync, bool host_writable, bool ad_need_write_protect,
> > - u64 mt_mask, u64 *new_spte)
> > + u64 mt_mask, struct rsvd_bits_validate *shadow_zero_check,
>
> Ugh, so I had a big email written about how I think we should add a module param
> to control 4-level vs. 5-level for all TDP pages, but then I realized it wouldn't
> work for nested EPT because that follows the root level used by L1. We could
> still make a global non_nested_tdp_shadow_zero_check or whatever, but then make_spte()
> would have to do some work to find the right rsvd_bits_validate, and the end result
> would likely be a mess.
>
> One idea to avoid exploding make_spte() would be to add a backpointer to the MMU
> in kvm_mmu_page. I don't love the idea, but I also don't love passing in rsvd_bits_validate.

Another idea. The only difference between 5-level and 4-level is that 5-level
fills in index [4], and I'm pretty sure 4-level doesn't touch that index. For
PAE NPT (32-bit SVM), the shadow root level will never change, so that's not an issue.

Nested NPT is the only case where anything for an EPT/NPT MMU can change, because
that follows EFER.NX.

In other words, the non-nested TDP reserved bits don't need to be recalculated
regardless of level, they can just fill in 5-level and leave it be.

E.g. something like the below. The sp->role.direct check could be removed if we
forced EFER.NX for nested NPT.

It's a bit ugly in that we'd pass both @kvm and @vcpu, so that needs some more
thought, but at minimum it means there's no need to recalc the reserved bits.

diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 84e64dbdd89e..05db9b89dc53 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -95,10 +95,18 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
u64 old_spte, bool prefetch, bool can_unsync,
bool host_writable, u64 *new_spte)
{
+ struct rsvd_bits_validate *rsvd_check;
int level = sp->role.level;
u64 spte = SPTE_MMU_PRESENT_MASK;
bool wrprot = false;

+ if (vcpu) {
+ rsvd_check = vcpu->arch.mmu->shadow_zero_check;
+ } else {
+ WARN_ON_ONCE(!tdp_enabled || !sp->role.direct);
+ rsvd_check = tdp_shadow_rsvd_bits;
+ }
+
if (sp->role.ad_disabled)
spte |= SPTE_TDP_AD_DISABLED_MASK;
else if (kvm_mmu_page_ad_need_write_protect(sp))
@@ -177,9 +185,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
if (prefetch)
spte = mark_spte_for_access_track(spte);

- WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level),
+ WARN_ONCE(is_rsvd_spte(rsvd_check, spte, level),
"spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level,
- get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level));
+ get_rsvd_bits(rsvd_check, spte, level));

if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) {
/* Enforced by kvm_mmu_hugepage_adjust. */