Re: [PATCH v2 3/3] KVM: x86/mmu: Don't allow TDP MMU to yield when recovering NX pages

From: Paolo Bonzini
Date: Tue Mar 30 2021 - 13:19:41 EST


On 25/03/21 23:25, Sean Christopherson wrote:
On Thu, Mar 25, 2021, Ben Gardon wrote:
On Thu, Mar 25, 2021 at 1:01 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
+static inline bool kvm_tdp_mmu_zap_gfn_range(struct kvm *kvm, gfn_t start,
+ gfn_t end)
+{
+ return __kvm_tdp_mmu_zap_gfn_range(kvm, start, end, true);
+}
+static inline bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)

I'm a little leary of adding an interface which takes a non-root
struct kvm_mmu_page as an argument to the TDP MMU.
In the TDP MMU, the struct kvm_mmu_pages are protected rather subtly.
I agree this is safe because we hold the MMU lock in write mode here,
but if we ever wanted to convert to holding it in read mode things
could get complicated fast.
Maybe this is more of a concern if the function started to be used
elsewhere since NX recovery is already so dependent on the write lock.

Agreed. Even writing the comment below felt a bit awkward when thinking about
additional users holding mmu_lock for read. Actually, I should remove that
specific blurb since zapping currently requires holding mmu_lock for write.

Ideally though, NX reclaim could use MMU read lock +
tdp_mmu_pages_lock to protect the list and do reclaim in parallel with
everything else.

Yar, processing all legacy MMU pages, and then all TDP MMU pages to avoid some
of these dependencies crossed my mind. But, it's hard to justify effectively
walking the list twice. And maintaining two lists might lead to balancing
issues, e.g. the legacy MMU and thus nested VMs get zapped more often than the
TDP MMU, or vice versa.

The nice thing about drawing the TDP MMU interface in terms of GFNs
and address space IDs instead of SPs is that it doesn't put
constraints on the implementation of the TDP MMU because those GFNs
are always going to be valid / don't require any shared memory.
This is kind of innocuous because it's immediately converted into that
gfn interface, so I don't know how much it really matters.

In any case this change looks correct and I don't want to hold up
progress with bikeshedding.
WDYT?

I think we're kind of hosed either way. Either we add a helper in the TDP MMU
that takes a SP, or we bleed a lot of information about the details of TDP MMU
into the common MMU. E.g. the function could be open-coded verbatim, but the
whole comment below, and the motivation for not feeding in flush is very
dependent on the internal details of TDP MMU.

I don't have a super strong preference. One thought would be to assert that
mmu_lock is held for write, and then it largely come future person's problem :-)

Queued all three, with lockdep_assert_held_write here.

Paolo