Re: [PATCH v2 1/7] s390: vfio-ap: wait for queue empty on queue reset

From: Tony Krowiak
Date: Tue May 07 2019 - 11:13:51 EST


On 5/7/19 4:10 AM, Pierre Morel wrote:
On 06/05/2019 21:37, Tony Krowiak wrote:
On 5/6/19 2:41 AM, Pierre Morel wrote:
On 03/05/2019 23:14, Tony Krowiak wrote:
Refactors the AP queue reset function to wait until the queue is empty
after the PQAP(ZAPQ) instruction is executed to zero out the queue as
required by the AP architecture.

Signed-off-by: Tony Krowiak <akrowiak@xxxxxxxxxxxxx>
---
 drivers/s390/crypto/vfio_ap_ops.c | 35 ++++++++++++++++++++++++++++++++---
 1 file changed, 32 insertions(+), 3 deletions(-)

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 900b9cf20ca5..b88a2a2ba075 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -271,6 +271,32 @@ static int vfio_ap_mdev_verify_no_sharing(struct ap_matrix_mdev *matrix_mdev)
ÂÂÂÂÂ return 0;
 }
+static void vfio_ap_mdev_wait_for_qempty(unsigned long apid, unsigned long apqi)
+{
+ÂÂÂ struct ap_queue_status status;
+ÂÂÂ ap_qid_t qid = AP_MKQID(apid, apqi);
+ÂÂÂ int retry = 5;
+
+ÂÂÂ do {
+ÂÂÂÂÂÂÂ status = ap_tapq(qid, NULL);
+ÂÂÂÂÂÂÂ switch (status.response_code) {
+ÂÂÂÂÂÂÂ case AP_RESPONSE_NORMAL:
+ÂÂÂÂÂÂÂÂÂÂÂ if (status.queue_empty)
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ return;
+ÂÂÂÂÂÂÂÂÂÂÂ msleep(20);

NIT:ÂÂÂÂ Fall through ?

Yes


+ÂÂÂÂÂÂÂÂÂÂÂ break;
+ÂÂÂÂÂÂÂ case AP_RESPONSE_RESET_IN_PROGRESS:
+ÂÂÂÂÂÂÂ case AP_RESPONSE_BUSY:
+ÂÂÂÂÂÂÂÂÂÂÂ msleep(20);
+ÂÂÂÂÂÂÂÂÂÂÂ break;
+ÂÂÂÂÂÂÂ default:
+ÂÂÂÂÂÂÂÂÂÂÂ pr_warn("%s: tapq err %02x: %04lx.%02lx may not be empty\n",
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ __func__, status.response_code, apid, apqi);

I do not thing the warning sentence is appropriate:
The only possible errors here are if the AP is not available due to AP checkstop, deconfigured AP or invalid APQN.

Right you are! I'll work on a new message.



+ÂÂÂÂÂÂÂÂÂÂÂ return;
+ÂÂÂÂÂÂÂ }
+ÂÂÂ } while (--retry);
+}
+
 /**
ÂÂ * assign_adapter_store
ÂÂ *
@@ -790,15 +816,18 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
ÂÂÂÂÂ return NOTIFY_OK;
 }
-static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
-ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ unsigned int retry)
+int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi)
 {
ÂÂÂÂÂ struct ap_queue_status status;
+ÂÂÂ int retry = 5;
ÂÂÂÂÂ do {
ÂÂÂÂÂÂÂÂÂ status = ap_zapq(AP_MKQID(apid, apqi));
ÂÂÂÂÂÂÂÂÂ switch (status.response_code) {
ÂÂÂÂÂÂÂÂÂ case AP_RESPONSE_NORMAL:
+ÂÂÂÂÂÂÂÂÂÂÂ vfio_ap_mdev_wait_for_qempty(apid, apqi);
+ÂÂÂÂÂÂÂÂÂÂÂ return 0;
+ÂÂÂÂÂÂÂ case AP_RESPONSE_DECONFIGURED:

Since you modify the switch, you can return for all the following cases:
AP_RESPONSE_DECONFIGURE
..._CHECKSTOP
..._INVALID_APQN


And you should wait for qempty on AP_RESET_IN_PROGRESS along with AP_RESPONSE_NORMAL

If a queue reset is in progress, we retry the zapq. Are you saying we
should wait for qempty then reissue the zapq?


Yes, I fear that if we reissue the zapq while RESET is in progress we could fall in a loop depending on the reset hardware time and the software retry .

I already did this in the forthcoming v4 series.




ÂÂÂÂÂÂÂÂÂÂÂÂÂ return 0;
ÂÂÂÂÂÂÂÂÂ case AP_RESPONSE_RESET_IN_PROGRESS:
ÂÂÂÂÂÂÂÂÂ case AP_RESPONSE_BUSY:

While at modifying this function, the AP_RESPONSE_BUSY is not a valid code for ZAPQ, you can remove this.

Okay


@@ -824,7 +853,7 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ matrix_mdev->matrix.apm_max + 1) {
ÂÂÂÂÂÂÂÂÂ for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ matrix_mdev->matrix.aqm_max + 1) {
-ÂÂÂÂÂÂÂÂÂÂÂ ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
+ÂÂÂÂÂÂÂÂÂÂÂ ret = vfio_ap_mdev_reset_queue(apid, apqi);

IMHO, since you are at changing this call, passing the apqn as parameter would be a good simplification.

Okay.

Sorry, I should have add: NIT.





ÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * Regardless whether a queue turns out to be busy, or
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * is not operational, we need to continue resetting

Depends on why the reset failed, but this is out of scope.

I'm not sure what you mean by out of scope here, but you do make a valid
point. If the response code for the zapq is AP_RESPONSE_DECONFIGURED,
there is probably no sense in continuing to reset queues for that
particular adapter. I'll consider a change here.

Yes, this was the point, but I consider this as a enhancement, trying a reset on bad queues AFAIK do no arm.

I included the enhancement in the forthcoming v4 series.