Re: [PATCH] Documentation: kvm: clarify SRCU locking order
From: Sean Christopherson
Date: Tue Jan 03 2023 - 12:42:43 EST
On Wed, Dec 28, 2022, Paolo Bonzini wrote:
> Currently only the locking order of SRCU vs kvm->slots_arch_lock
> and kvm->slots_lock is documented. Extend this to kvm->lock
> since Xen emulation got it terribly wrong.
>
> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
> ---
> Documentation/virt/kvm/locking.rst | 19 ++++++++++++++-----
> 1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 845a561629f1..a3ca76f9be75 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -16,17 +16,26 @@ The acquisition orders for mutexes are as follows:
> - kvm->slots_lock is taken outside kvm->irq_lock, though acquiring
> them together is quite rare.
>
> -- Unlike kvm->slots_lock, kvm->slots_arch_lock is released before
> - synchronize_srcu(&kvm->srcu). Therefore kvm->slots_arch_lock
> - can be taken inside a kvm->srcu read-side critical section,
> - while kvm->slots_lock cannot.
> -
> - kvm->mn_active_invalidate_count ensures that pairs of
> invalidate_range_start() and invalidate_range_end() callbacks
> use the same memslots array. kvm->slots_lock and kvm->slots_arch_lock
> are taken on the waiting side in install_new_memslots, so MMU notifiers
> must not take either kvm->slots_lock or kvm->slots_arch_lock.
>
> +For SRCU:
> +
> +- ``synchronize_srcu(&kvm->srcu)`` is called _inside_
> + the kvm->slots_lock critical section, therefore kvm->slots_lock
> + cannot be taken inside a kvm->srcu read-side critical section.
> + Instead, kvm->slots_arch_lock is released before the call
> + to ``synchronize_srcu()`` and _can_ be taken inside a
> + kvm->srcu read-side critical section.
> +
> +- kvm->lock is taken inside kvm->srcu, therefore
Prior to the recent Xen change, is this actually true? There are many instances
where kvm->srcu is taken inside kvm->lock, but I can't find any existing cases
where the reverse is true. Logically, it makes sense to take kvm->lock first since
kvm->srcu can be taken deep in helpers, e.g. for accessing guest memory. It's also
more consistent to take kvm->lock first since kvm->srcu is taken inside vcpu->mutex,
and vcpu->mutex is taken inside kvm->lock.
Disallowing synchronize_srcu(kvm->srcu) inside kvm->lock isn't probelmatic per se,
but it's going to result in a weird set of rules because synchronize_scru() can,
and is, called while holding a variety of other locks.
In other words, IMO taking kvm->srcu outside of kvm->lock in the Xen code is the
real bug.
> + ``synchronize_srcu(&kvm->srcu)`` cannot be called inside
> + a kvm->lock critical section. If you cannot delay the
> + call until after kvm->lock is released, use ``call_srcu``.
> +
> On x86:
>
> - vcpu->mutex is taken outside kvm->arch.hyperv.hv_lock
> --
> 2.31.1
>