Re: [PATCH v11 01/14] s390/vfio-ap: No need to disable IRQ after queue reset

From: Tony Krowiak
Date: Mon Nov 02 2020 - 09:36:02 EST




On 10/30/20 11:43 PM, Halil Pasic wrote:
On Fri, 30 Oct 2020 16:37:04 -0400
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:

On 10/30/20 1:42 PM, Halil Pasic wrote:
On Thu, 29 Oct 2020 19:29:35 -0400
Tony Krowiak <akrowiak@xxxxxxxxxxxxx> wrote:
@@ -1177,7 +1166,10 @@ static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
*/
if (ret)
rc = ret;
- vfio_ap_irq_disable_apqn(AP_MKQID(apid, apqi));
+ q = vfio_ap_get_queue(matrix_mdev,
+ AP_MKQID(apid, apqi));
+ if (q)
+ vfio_ap_free_aqic_resources(q);
Is it safe to do vfio_ap_free_aqic_resources() at this point? I don't
think so. I mean does the current code (and vfio_ap_mdev_reset_queue()
in particular guarantee that the reset is actually done when we arrive
here)? BTW, I think we have a similar problem with the current code as
well.
If the return code from the vfio_ap_mdev_reset_queue() function
is zero, then yes, we are guaranteed the reset was done and the
queue is empty.
I've read up on this and I disagree. We should discuss this offline.
Maybe you are confusing things here; my statement is specific to the return
code from the vfio_ap_mdev_reset_queue() function, not the response code
from the PQAP(ZAPQ) instruction. The vfio_ap_mdev_reset_queue()
function issues the PQAP(ZAPQ) instruction and if the status response code
is 0 indicating the reset was successfully initiated, it waits for the
queue to empty. When the queue is empty, it returns 0 to indicate
the queue is reset.
If the queue does not become empty after a period of
time,
it will issue a warning (WARN_ON_ONCE) and return 0. In that case, I suppose
there is no guarantee the reset was done, so maybe a change needs to be
made there such as a non-zero return code.

I've overlooked the wait for empty. Maybe that return 0 had a part in
it. I now remember me insisting on having the wait code added when the
interrupt support was in the make. Sorry!

If we have given up on out of retries retries, we are in trouble anyway.
  The function returns a non-zero return code if
the reset fails or the queue the reset did not complete within a given
amount of time, so maybe we shouldn't free AQIC resources when
we get a non-zero return code from the reset function?
If the queue is gone, or broken, it won't produce interrupts or poke the
notifier bit, and we should clean up the AQIC resources.
True, which is what the code provided by this patch does; however,
the AQIC resources should be cleaned up only if the KVM pointer is
not NULL for reasons discussed elsewhere.
Yes, but these should be cleaned up before the KVM pointer becomes
null. We don't want to keep the page with the notifier byte pinned
forever, or?

No, we do not want to keep the page forever. I probably should
have been clearer. There are times we do a reset - e.g., on remove
of the mdev - at which time there should be no KVM pointer, or
else the remove will not be allowed. Of course, we won't do the
reset either, so I guess you can ignore my comment. If there is
no KVM pointer yet a page remains pinned, something bad
happened.