Re: reproducible BUG() in kvm_mmu_get_root() in TDP MMU

From: Sean Christopherson
Date: Tue Jan 05 2021 - 13:47:21 EST


On Tue, Jan 05, 2021, Paolo Bonzini wrote:
> On 05/01/21 18:49, Ben Gardon wrote:
> > for_each_tdp_mmu_root(kvm, root) {
> > kvm_mmu_get_root(kvm, root);
> > <Do something, yield the MMU lock>
> > kvm_mmu_put_root(kvm, root);
> > }
> >
> > In these cases the get and put root calls are there to ensure that the
> > root is not freed while the function is running, however they do this
> > too well. If the put root call reduces the root's root_count to 0, it
> > should be removed from the roots list and freed before the MMU lock is
> > released. However the above pattern never bothers to free the root.
> > The following would fix this bug:
> >
> > -kvm_mmu_put_root(kvm, root);
> > +if (kvm_mmu_put_root(kvm, root))
> > + kvm_tdp_mmu_free_root(kvm, root);
>
> Is it worth writing a more complex iterator struct, so that
> for_each_tdp_mmu_root takes care of the get and put?

Ya, and maybe with an "as_id" variant to avoid the get/put? Not sure that's a
worthwhile optimization though.

On a related topic, there are a few subtleties with respect to
for_each_tdp_mmu_root() that we should document/comment. The flows that drop
mmu_lock while iterating over the roots don't protect against the list itself
from being modified. E.g. the next entry could be deleted, or a new root
could be added. I think I've convinced myself that there are no existing bugs,
but we should document that the exact current behavior is required for
correctness.

The use of "unsafe" list_for_each_entry() in particular is unintuitive, as using
the "safe" variant would dereference a deleted entry in the "next entry is
deleted" scenario.

And regarding addomg a root, using list_add_tail() instead of list_add() in
get_tdp_mmu_vcpu_root() would cause iteration to visit a root that was added
after the iteration started (though I don't think this would ever be problematic
in practice?).