Re: [PATCH v3 3/3] vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

From: John Hubbard
Date: Mon May 25 2020 - 20:46:42 EST

On 2020-05-25 17:37, Jason Gunthorpe wrote:
commit 318b275fbca1ab9ec0862de71420e0e92c3d1aa7
Author: Gleb Natapov <gleb@xxxxxxxxxx>
Date: Tue Mar 22 16:30:51 2011 -0700

mm: allow GUP to fail instead of waiting on a page

GUP user may want to try to acquire a reference to a page if it is already
in memory, but not if IO, to bring it in, is needed. For example KVM may
tell vcpu to schedule another guest process if current one is trying to
access swapped out page. Meanwhile, the page will be swapped in and the
guest process, that depends on it, will be able to run again.

This patch adds FAULT_FLAG_RETRY_NOWAIT (suggested by Linus) and
FOLL_NOWAIT follow_page flags. FAULT_FLAG_RETRY_NOWAIT, when used in
conjunction with VM_FAULT_ALLOW_RETRY, indicates to handle_mm_fault that
it shouldn't drop mmap_sem and wait on a page, but return VM_FAULT_RETRY

So, from kvm's perspective it was to avoid excessively long blocking in
common paths when it could rejoin the completed IO by somehow waiting
on a page itself?

Or perhaps some variation on that, such as just retrying with an intervening
schedule() call. It's not clear just from that commit.

It all seems like it should not be used unless the page is going to go
to IO?

That's my conclusion so far, yes.

Certainly there is no reason to optimize the fringe case of vfio
sleeping if there is and incorrect concurrnent attempt to disable the
a BAR.

Definitely agree with that position.

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 8429d5aa31e44..e32e8e52a57ac 100644
+++ b/include/linux/mm.h
@@ -430,6 +430,15 @@ extern pgprot_t protection_map[16];
* continuous faults with flags (b). We should always try to detect pending
* signals before a retry to make sure the continuous page faults can still be
* interrupted if necessary.
+ *
+ * About @FAULT_FLAG_RETRY_NOWAIT: this is intended for callers who would like
+ * to acquire a page, but only if the page is already in memory. If, on the
+ * other hand, the page requires IO in order to bring it into memory, then fault
+ * handlers will immediately return VM_FAULT_RETRY ("don't wait"), while leaving
+ * mmap_lock held ("don't drop mmap_lock"). For example, this is useful for
+ * virtual machines that have multiple guests running: if guest A attempts
+ * get_user_pages() on a swapped out page, another guest can be scheduled while
+ * waiting for IO to swap in guest A's page.
#define FAULT_FLAG_WRITE 0x01

It seems reasonable but people might complain about the kvm
specifics of the explanation.

It might be better to explain how the caller is supposed to know when
it is OK to try GUP again and expect success, as it seems to me this
is really about externalizing the sleep for page wait?

OK, good point. The example was helpful in the commit log, but not quite
appropriate in mm.h, yes.

John Hubbard