Re: [PATCH 1/1] s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

From: Tony Krowiak
Date: Thu Feb 11 2021 - 09:48:34 EST




On 2/11/21 7:23 AM, Cornelia Huck wrote:
On Wed, 10 Feb 2021 15:34:24 -0500
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:

On 2/10/21 5:53 AM, Cornelia Huck wrote:
On Tue, 9 Feb 2021 14:48:30 -0500
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.

The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.

Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Tony Krowiak <akrowiak@xxxxxxxxxxxxx>
---
drivers/s390/crypto/vfio_ap_ops.c | 75 ++++++++++++++++++-------------
1 file changed, 45 insertions(+), 30 deletions(-)

static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
- matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
- vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
- kvm_put_kvm(matrix_mdev->kvm);
- matrix_mdev->kvm = NULL;
+ if (matrix_mdev->kvm) {
If you're doing setting/unsetting under matrix_dev->lock, is it
possible that matrix_mdev->kvm gets unset between here and the next
line, as you don't hold the lock?
That is highly unlikely because the only place the matrix_mdev->kvm
pointer is cleared is in this function which is called from only two
places: the notifier that handles the VFIO_GROUP_NOTIFY_SET_KVM
notification when the KVM pointer is cleared; the vfio_ap_mdev_release()
function which is called when the mdev fd is closed (i.e., when the guest
is shut down). The fact is, with the only end-to-end implementation
currently available, the notifier callback is never invoked to clear
the KVM pointer because the vfio_ap_mdev_release callback is
invoked first and it unregisters the notifier callback.

Having said that, I suppose there is no guarantee that there will not
be different userspace clients in the future that do things in a
different order. At the very least, it wouldn't hurt to protect against
that as you suggest below.
Yes, if userspace is able to use the interfaces in the certain way, we
should always make sure that nothing bad happens if it does so, even if
known userspace applications are well-behaved.

[Can we make an 'evil userspace' test program, maybe? The hardware
dependency makes this hard to run, though.]

Of course it is possible to create such a test program, but off the
top of my head, I can't come up with an algorithm that would
result in the scenario you have laid out. I haven't dabbled in the QEMU
space in quite some time; so, there would also be a bit of a re-learning
curve. I'm not sure it would be worth the effort to take this on given
how unlikely it is this scenario can happen, but I will take it into
consideration as it is a good idea.


Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?
+ kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+ mutex_lock(&matrix_dev->lock);
+ matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
+ vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+ kvm_put_kvm(matrix_mdev->kvm);
+ matrix_mdev->kvm = NULL;
+ mutex_unlock(&matrix_dev->lock);
+ }
}