Re: [PATCH v3 12/15] KVM: MMU: fast invalid all shadow pages

From: Marcelo Tosatti
Date: Thu Apr 18 2013 - 09:45:57 EST


On Thu, Apr 18, 2013 at 12:00:16PM +0800, Xiao Guangrong wrote:
> >
> > What is the justification for this?
>
> We want the rmap of being deleted memslot is removed-only that is
> needed for unmapping rmap out of mmu-lock.
>
> ======
> 1) do not corrupt the rmap
> 2) keep pte-list-descs available
> 3) keep shadow page available
>
> Resolve 1):
> we make the invalid rmap be remove-only that means we only delete and
> clear spte from the rmap, no new sptes can be added to it.
> This is reasonable since kvm can not do address translation on invalid rmap
> (gfn_to_pfn is failed on invalid memslot) and all sptes on invalid rmap can
> not be reused (they belong to invalid shadow page).
> ======
>
> clear_flush_young / test_young / change_pte of mmu-notify can rewrite
> rmap with the present-spte (P bit is set), we should umap rmap in
> these handlers.
>
> >
> >> +
> >> + /*
> >> + * To ensure that all vcpus and mmu-notify are not clearing
> >> + * spte and rmap entry.
> >> + */
> >> + synchronize_srcu_expedited(&kvm->srcu);
> >> +}
> >> +
> >> #ifdef MMU_DEBUG
> >> static int is_empty_shadow_page(u64 *spt)
> >> {
> >> @@ -2219,6 +2283,11 @@ static void clear_sp_write_flooding_count(u64 *spte)
> >> __clear_sp_write_flooding_count(sp);
> >> }
> >>
> >> +static bool is_valid_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> >> +{
> >> + return likely(sp->mmu_valid_gen == kvm->arch.mmu_valid_gen);
> >> +}
> >> +
> >> static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> >> gfn_t gfn,
> >> gva_t gaddr,
> >> @@ -2245,6 +2314,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> >> role.quadrant = quadrant;
> >> }
> >> for_each_gfn_sp(vcpu->kvm, sp, gfn) {
> >> + if (!is_valid_sp(vcpu->kvm, sp))
> >> + continue;
> >> +
> >> if (!need_sync && sp->unsync)
> >> need_sync = true;
> >>
> >> @@ -2281,6 +2353,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
> >>
> >> account_shadowed(vcpu->kvm, gfn);
> >> }
> >> + sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen;
> >> init_shadow_page_table(sp);
> >> trace_kvm_mmu_get_page(sp, true);
> >> return sp;
> >> @@ -2451,8 +2524,12 @@ static int kvm_mmu_prepare_zap_page(struct kvm *kvm, struct kvm_mmu_page *sp,
> >> ret = mmu_zap_unsync_children(kvm, sp, invalid_list);
> >> kvm_mmu_page_unlink_children(kvm, sp);
> >
> > The rmaps[] arrays linked to !is_valid_sp() shadow pages should not be
> > accessed (as they have been freed already).
> >
> > I suppose the is_valid_sp() conditional below should be moved earlier,
> > before kvm_mmu_unlink_parents or any other rmap access.
> >
> > This is fine: the !is_valid_sp() shadow pages are only reachable
> > by SLAB and the hypervisor itself.
>
> Unfortunately we can not do this. :(
>
> The sptes in shadow pape can linked to many slots, if the spte is linked
> to the rmap of being deleted memslot, it is ok, otherwise, the rmap of
> still used memslot is miss updated.
>
> For example, slot 0 is being deleted, sp->spte[0] is linked on slot[0].rmap,
> sp->spte[1] is linked on slot[1].rmap. If we do not access rmap of this 'sp',
> the already-freed spte[1] is still linked on slot[1].rmap.
>
> We can let kvm update the rmap for sp->spte[1] and do not unlink sp->spte[0].
> This is also not allowed since mmu-notify can access the invalid rmap before
> the memslot is destroyed, then mmu-notify will get already-freed spte on
> the rmap or page Access/Dirty is miss tracked (if let mmu-notify do not access
> the invalid rmap).

Why not release all rmaps?

Subject: [PATCH v2 3/7] KVM: x86: introduce kvm_clear_all_gfn_page_info

This function is used to reset the rmaps and page info of all guest page
which will be used in later patch

Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxxxxxxxxx>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/