Re: [PATCH v3 4/5] guest_memfd: add support for userfaultfd minor mode

From: Nikita Kalyazin
Date: Mon Dec 01 2025 - 08:40:22 EST




On 30/11/2025 11:18, Mike Rapoport wrote:
From: "Mike Rapoport (Microsoft)" <rppt@xxxxxxxxxx>

userfaultfd notifications about minor page faults used for live migration
and snapshotting of VMs with memory backed by shared hugetlbfs or tmpfs
mappings as described in detail in commit 7677f7fd8be7 ("userfaultfd: add
minor fault registration mode").

To use the same mechanism for VMs that use guest_memfd to map their memory,
guest_memfd should support userfaultfd minor mode.

Extend ->fault() method of guest_memfd with ability to notify core page
fault handler that a page fault requires handle_userfault(VM_UFFD_MINOR) to
complete and add implementation of ->get_folio_noalloc() to guest_memfd
vm_ops.

Reviewed-by: Liam R. Howlett <Liam.Howlett@xxxxxxxxxx>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@xxxxxxxxxx>
---
virt/kvm/guest_memfd.c | 33 ++++++++++++++++++++++++++++++++-
1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c
index ffadc5ee8e04..dca6e373937b 100644
--- a/virt/kvm/guest_memfd.c
+++ b/virt/kvm/guest_memfd.c
@@ -4,6 +4,7 @@
#include <linux/kvm_host.h>
#include <linux/pagemap.h>
#include <linux/anon_inodes.h>
+#include <linux/userfaultfd_k.h>

#include "kvm_mm.h"

@@ -359,7 +360,15 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
if (!((u64)inode->i_private & GUEST_MEMFD_FLAG_INIT_SHARED))
return VM_FAULT_SIGBUS;

- folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+ folio = filemap_lock_folio(inode->i_mapping, vmf->pgoff);
+ if (!IS_ERR_OR_NULL(folio) && userfaultfd_minor(vmf->vma)) {
+ ret = VM_FAULT_UFFD_MINOR;
+ goto out_folio;
+ }

I realised that I might have been wrong in [1] saying that the noalloc get folio was ok for our use case. Unfortunately we rely on a minor fault to get generated even when the page is being allocated. Peter and I discussed it originally in [2]. Since we want to populate guest memory with the content supplied by userspace on demand, we have to be able to intercept the very first access, meaning we either need a minor or major UFFD event for that. We decided to make use of the minor at the time. If we have to preserve the shmem semantics, it forces us to implement support for major/UFFDIO_COPY.

[1] https://lore.kernel.org/all/4405c306-9d7c-4fd6-9ea6-2ed1b73f5c2e@xxxxxxxxxx
[2] https://lore.kernel.org/kvm/Z9HhTjEWtM58Zfxf@x1.local

+
+ if (PTR_ERR(folio) == -ENOENT)
+ folio = kvm_gmem_get_folio(inode, vmf->pgoff);
+
if (IS_ERR(folio)) {
int err = PTR_ERR(folio);

@@ -390,8 +399,30 @@ static vm_fault_t kvm_gmem_fault_user_mapping(struct vm_fault *vmf)
return ret;
}

+#ifdef CONFIG_USERFAULTFD
+static struct folio *kvm_gmem_get_folio_noalloc(struct inode *inode,
+ pgoff_t pgoff)
+{
+ struct folio *folio;
+
+ folio = filemap_lock_folio(inode->i_mapping, pgoff);
+ if (IS_ERR_OR_NULL(folio))
+ return folio;
+
+ if (!folio_test_uptodate(folio)) {
+ clear_highpage(folio_page(folio, 0));
+ kvm_gmem_mark_prepared(folio);
+ }
+
+ return folio;
+}
+#endif
+
static const struct vm_operations_struct kvm_gmem_vm_ops = {
.fault = kvm_gmem_fault_user_mapping,
+#ifdef CONFIG_USERFAULTFD
+ .get_folio_noalloc = kvm_gmem_get_folio_noalloc,
+#endif
};

static int kvm_gmem_mmap(struct file *file, struct vm_area_struct *vma)
--
2.51.0