Re: [PATCH] s390/vfio-ap: fix unregister GISC when KVM is already gone results in OOPS

From: Tony Krowiak
Date: Fri Sep 25 2020 - 18:29:25 EST




On 9/21/20 11:45 AM, Halil Pasic wrote:
On Fri, 18 Sep 2020 13:02:34 -0400
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:

Attempting to unregister Guest Interruption Subclass (GISC) when the
link between the matrix mdev and KVM has been removed results in the
following:

"Kernel panic -not syncing: Fatal exception: panic_on_oops"

This patch fixes this bug by verifying the matrix mdev and KVM are still
linked prior to unregistering the GISC.

I read from your commit message that this happens when the link between
the KVM and the matrix mdev was established and then got severed.

I assume the interrupts were previously enabled, and were not been
disabled or cleaned up because q->saved_isc != VFIO_AP_ISC_INVALID.

That means the guest enabled interrupts and then for whatever
reason got destroyed, and this happens on mdev cleanup.

Does it happen all the time or is it some sort of a race?

This is a race condition that happens when a guest is terminated and the mdev is
removed in rapid succession. I came across it with one of my hades test cases
on cleanup of the resources after the test case completes. There is a bug in the problem appears
the vfio_ap_mdev_release function because it tries to reset the APQNs after the bits are
cleared from the matrix_mdev.matrix, so the resets never happen.

Fixing that, however, does not resolve the issue, so I'm in the process of doing a bunch of
tracing to see the flow of the resets etc. during the lifecycle of the mdev during this
hades test. I should have a better answer next week.


Signed-off-by: Tony Krowiak <akrowiak@xxxxxxxxxxxxx>
---
drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++-----
1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index e0bde8518745..847a88642644 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn)
*/
static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q)
{
- if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev)
- kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc);
- if (q->saved_pfn && q->matrix_mdev)
- vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
- &q->saved_pfn, 1);
+ if (q->matrix_mdev) {
+ if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm)
+ kvm_s390_gisc_unregister(q->matrix_mdev->kvm,
+ q->saved_isc);
I don't quite understand the logic here. I suppose we need to ensure
that the struct kvm is 'alive' at least until kvm_s390_gisc_unregister()
is done. That is supposed be ensured by kvm_get_kvm() in
vfio_ap_mdev_set_kvm() and kvm_put_kvm() in vfio_ap_mdev_release().

If the critical section in vfio_ap_mdev_release() is done and
matrix_mdev->kvm was set to NULL there then I would expect that the
queues are already reset and q->saved_isc == VFIO_AP_ISC_INVALID. So
this should not blow up.

Now if this happens before the critical section in
vfio_ap_mdev_release() is done, I ask myself how are we going to do the
kvm_put_kvm()?

Another question. Do we hold the matrix_dev->lock here?

+ if (q->saved_pfn)
+ vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev),
+ &q->saved_pfn, 1);
+ }
+
q->saved_pfn = 0;
q->saved_isc = VFIO_AP_ISC_INVALID;
}