Re: [PATCH v2 3/3] KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages

From: Ben Gardon
Date: Thu Mar 25 2021 - 18:47:06 EST


On Thu, Mar 25, 2021 at 3:25 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> On Thu, Mar 25, 2021, Ben Gardon wrote:
> > On Thu, Mar 25, 2021 at 1:01 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > > +static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start,
> > > + gfn_t end)
> > > +{
> > > + return __kvm_tdp_mmu_zap_gfn_range(kvm, start, end, true);
> > > +}
> > > +static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> >
> > I'm a little leary of adding an interface which takes a non-root
> > struct kvm_mmu_page as an argument to the TDP MMU.
> > In the TDP MMU, the struct kvm_mmu_pages are protected rather subtly.
> > I agree this is safe because we hold the MMU lock in write mode here,
> > but if we ever wanted to convert to holding it in read mode things
> > could get complicated fast.
> > Maybe this is more of a concern if the function started to be used
> > elsewhere since NX recovery is already so dependent on the write lock.
>
> Agreed. Even writing the comment below felt a bit awkward when thinking about
> additional users holding mmu_lock for read. Actually, I should remove that
> specific blurb since zapping currently requires holding mmu_lock for write.
>
> > Ideally though, NX reclaim could use MMU read lock +
> > tdp_mmu_pages_lock to protect the list and do reclaim in parallel with
> > everything else.
>
> Yar, processing all legacy MMU pages, and then all TDP MMU pages to avoid some
> of these dependencies crossed my mind. But, it's hard to justify effectively
> walking the list twice. And maintaining two lists might lead to balancing
> issues, e.g. the legacy MMU and thus nested VMs get zapped more often than the
> TDP MMU, or vice versa.

I think in an earlier version of the TDP that I sent out, NX reclaim
was a seperate thread for the two MMUs, sidestepping the balance
issue.
I think the TDP MMU also had a seperate NX reclaim list.
That would also make it easier to do something under the read lock.

>
> > The nice thing about drawing the TDP MMU interface in terms of GFNs
> > and address space IDs instead of SPs is that it doesn't put
> > constraints on the implementation of the TDP MMU because those GFNs
> > are always going to be valid / don't require any shared memory.
> > This is kind of innocuous because it's immediately converted into that
> > gfn interface, so I don't know how much it really matters.
> >
> > In any case this change looks correct and I don't want to hold up
> > progress with bikeshedding.
> > WDYT?
>
> I think we're kind of hosed either way. Either we add a helper in the TDP MMU
> that takes a SP, or we bleed a lot of information about the details of TDP MMU
> into the common MMU. E.g. the function could be open-coded verbatim, but the
> whole comment below, and the motivation for not feeding in flush is very
> dependent on the internal details of TDP MMU.
>
> I don't have a super strong preference. One thought would be to assert that
> mmu_lock is held for write, and then it largely come future person's problem :-)

Yeah, I agree and I'm happy to kick this proverbial can down the road
until we actually add an NX reclaim implementation that uses the MMU
read lock.

>
> > > +{
> > > + gfn_t end = sp->gfn + KVM_PAGES_PER_HPAGE(sp->role.level);
> > > +
> > > + /*
> > > + * Don't allow yielding, as the caller may have a flush pending. Note,
> > > + * if mmu_lock is held for write, zapping will never yield in this case,
> > > + * but explicitly disallow it for safety. The TDP MMU does not yield
> > > + * until it has made forward progress (steps sideways), and when zapping
> > > + * a single shadow page that it's guaranteed to see (thus the mmu_lock
> > > + * requirement), its "step sideways" will always step beyond the bounds
> > > + * of the shadow page's gfn range and stop iterating before yielding.
> > > + */
> > > + return __kvm_tdp_mmu_zap_gfn_range(kvm, sp->gfn, end, false);
> > > +}
> > > void kvm_tdp_mmu_zap_all(struct kvm *kvm);
> > >
> > > int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
> > > --
> > > 2.31.0.291.g576ba9dcdaf-goog
> > >