Re: [PATCH v4 04/14] KVM: s390: pv: avoid stalls when making pages secure

From: Christian Borntraeger
Date: Tue Aug 31 2021 - 11:11:25 EST




On 31.08.21 17:00, Claudio Imbrenda wrote:
On Tue, 31 Aug 2021 16:32:24 +0200
Christian Borntraeger <borntraeger@xxxxxxxxxx> wrote:

On 18.08.21 15:26, Claudio Imbrenda wrote:
Improve make_secure_pte to avoid stalls when the system is heavily
overcommitted. This was especially problematic in kvm_s390_pv_unpack,
because of the loop over all pages that needed unpacking.

Due to the locks being held, it was not possible to simply replace
uv_call with uv_call_sched. A more complex approach was
needed, in which uv_call is replaced with __uv_call, which does not
loop. When the UVC needs to be executed again, -EAGAIN is returned, and
the caller (or its caller) will try again.

When -EAGAIN is returned, the path is the same as when the page is in
writeback (and the writeback check is also performed, which is
harmless).

To me it looks like
handle_pv_uvc does not handle EAGAIN but also calls into this code. Is this code
path ok or do we need to change something here?

EAGAIN will be propagated all the way to userspace, which will retry.

if the UVC fails, the page does not get unpinned, and the next attempt
to run the UVC in the guest will trigger this same path.

if you don't like it, I can change handle_pv_uvc like this

if (rc == -EINVAL || rc == -EAGAIN)

which will save a trip to userspace

I think a comment would be good.



Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx>
Fixes: 214d9bbcd3a672 ("s390/mm: provide memory management functions for protected KVM guests")
---
arch/s390/kernel/uv.c | 29 +++++++++++++++++++++++------
1 file changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index aeb0a15bcbb7..68a8fbafcb9c 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -180,7 +180,7 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
{
pte_t entry = READ_ONCE(*ptep);
struct page *page;
- int expected, rc = 0;
+ int expected, cc = 0;
if (!pte_present(entry))
return -ENXIO;
@@ -196,12 +196,25 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
if (!page_ref_freeze(page, expected))
return -EBUSY;
set_bit(PG_arch_1, &page->flags);
- rc = uv_call(0, (u64)uvcb);
+ /*
+ * If the UVC does not succeed or fail immediately, we don't want to
+ * loop for long, or we might get stall notifications.
+ * On the other hand, this is a complex scenario and we are holding a lot of
+ * locks, so we can't easily sleep and reschedule. We try only once,
+ * and if the UVC returned busy or partial completion, we return
+ * -EAGAIN and we let the callers deal with it.
+ */
+ cc = __uv_call(0, (u64)uvcb);
page_ref_unfreeze(page, expected);
- /* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
- if (rc)
- rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
- return rc;
+ /*
+ * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
+ * If busy or partially completed, return -EAGAIN.
+ */
+ if (cc == UVC_CC_OK)
+ return 0;
+ else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
+ return -EAGAIN;
+ return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
}
/*
@@ -254,6 +267,10 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
mmap_read_unlock(gmap->mm);
if (rc == -EAGAIN) {
+ /*
+ * If we are here because the UVC returned busy or partial
+ * completion, this is just a useless check, but it is safe.
+ */
wait_on_page_writeback(page);
} else if (rc == -EBUSY) {
/*