Re: [PATCH 05/12] KVM: X86/MMU: Clear unsync bit directly in __mmu_unsync_walk()

From: Sean Christopherson
Date: Thu Jul 21 2022 - 12:27:05 EST


On Thu, Jul 21, 2022, Lai Jiangshan wrote:
> On Wed, Jul 20, 2022 at 3:52 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
>
> > > ---
> > > arch/x86/kvm/mmu/mmu.c | 22 +++++++++++++---------
> > > 1 file changed, 13 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> > > index f35fd5c59c38..2446ede0b7b9 100644
> > > --- a/arch/x86/kvm/mmu/mmu.c
> > > +++ b/arch/x86/kvm/mmu/mmu.c
> > > @@ -1794,19 +1794,23 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp,
> > > return -ENOSPC;
> > >
> > > ret = __mmu_unsync_walk(child, pvec);
> > > - if (!ret) {
> > > - clear_unsync_child_bit(sp, i);
> > > - continue;
> > > - } else if (ret > 0) {
> > > - nr_unsync_leaf += ret;
> > > - } else
> > > + if (ret < 0)
> > > return ret;
> > > - } else if (child->unsync) {
> > > + nr_unsync_leaf += ret;
> > > + }
> > > +
> > > + /*
> > > + * Clear unsync bit for @child directly if @child is fully
> > > + * walked and all the unsync shadow pages descended from
> > > + * @child (including itself) are added into @pvec, the caller
> > > + * must sync or zap all the unsync shadow pages in @pvec.
> > > + */
> > > + clear_unsync_child_bit(sp, i);
> > > + if (child->unsync) {
> > > nr_unsync_leaf++;
> > > if (mmu_pages_add(pvec, child, i))
> >
> > This ordering is wrong, no? If the child itself is unsync and can't be added to
> > @pvec, i.e. fails here, then clearing its bit in unsync_child_bitmap is wrong.
>
> mmu_pages_add() can always successfully add the page to @pvec and
> the caller needs to guarantee there is enough room to do so.
>
> When it returns true, it means it will fail if you keep adding pages.

Oof, that's downright evil. As prep work, can you fold in the attached patches
earlier in this series? Then this patch can yield:

for_each_set_bit(i, sp->unsync_child_bitmap, 512) {
struct kvm_mmu_page *child;
u64 ent = sp->spt[i];

if (!is_shadow_present_pte(ent) || is_large_pte(ent))
goto clear_unsync_child;

child = to_shadow_page(ent & SPTE_BASE_ADDR_MASK);
if (!child->unsync && !child->unsync_children)
goto clear_unsync_child;

if (mmu_is_page_vec_full(pvec))
return -ENOSPC;

mmu_pages_add(pvec, child, i);

if (child->unsync_children) {
ret = __mmu_unsync_walk(child, pvec);
if (!ret)
goto clear_unsync_child;
else if (ret > 0)
nr_unsync_leaf += ret;
else
return ret;
} else {
nr_unsync_leaf++;
}

clear_unsync_child:
/*
* Clear the unsync info, the child is either already sync
* (bitmap is stale) or is guaranteed to be zapped/synced by
* the caller before mmu_lock is released. Note, the caller is
* required to zap/sync all entries in @pvec even if an error
* is returned!
*/
clear_unsync_child_bit(sp, i);
}