Re: [PATCH v2 13/15] KVM: guest_memfd: implement userfaultfd operations
From: Mike Rapoport
Date: Fri Mar 27 2026 - 07:55:21 EST
On Thu, Mar 26, 2026 at 07:33:03PM -0700, James Houghton wrote:
> On Fri, Mar 6, 2026 at 9:19 AM Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> >
> > From: Nikita Kalyazin <kalyazin@xxxxxxxxxx>
> >
> > userfaultfd notifications about page faults used for live migration
> > and snapshotting of VMs.
> >
> > MISSING mode allows post-copy live migration and MINOR mode allows
> > optimization for post-copy live migration for VMs backed with shared
> > hugetlbfs or tmpfs mappings as described in detail in commit
> > 7677f7fd8be7 ("userfaultfd: add minor fault registration mode").
> >
> > To use the same mechanisms for VMs that use guest_memfd to map their
> > memory, guest_memfd should support userfaultfd operations.
> >
> > Add implementation of vm_uffd_ops to guest_memfd.
> >
> > Signed-off-by: Nikita Kalyazin <kalyazin@xxxxxxxxxx>
> > Co-developed-by: Mike Rapoport (Microsoft) <rppt@xxxxxxxxxx>
> > Signed-off-by: Mike Rapoport (Microsoft) <rppt@xxxxxxxxxx>
>
> Overall looks fine to me, but I am slightly concerned about in-place
> conversion[1], and I think you're going to want to implement a
> kvm_gmem_folio_present() op or something (like I was saying on the
> previous patch[2]).
Let's solve each problem in it's time :)
> [1]: https://lore.kernel.org/kvm/20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@xxxxxxxxxx/
> [2]: https://lore.kernel.org/linux-mm/CADrL8HVUJ5FL97d9ytxp2WXos6HS+U+ycpsi5VxffsW9vacr9Q@xxxxxxxxxxxxxx/
>
> Some in-line comments below.
>
> > ---
> > mm/filemap.c | 1 +
> > virt/kvm/guest_memfd.c | 84 +++++++++++++++++++++++++++++++++++++++++-
> > 2 files changed, 83 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 6cd7974d4ada..19dfcebcd23f 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -107,6 +108,12 @@ static int kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slot *slot,
> > return __kvm_gmem_prepare_folio(kvm, slot, index, folio);
> > }
> >
> > +static struct folio *kvm_gmem_get_folio_noalloc(struct inode *inode, pgoff_t pgoff)
> > +{
> > + return __filemap_get_folio(inode->i_mapping, pgoff,
> > + FGP_LOCK | FGP_ACCESSED, 0);
> > +}
>
> When in-place conversion is supported, I wonder what the semantics
> should be for when we get userfaults.
>
> Upon a userspace access to a file offset that is populated but
> private, should we get a userfault or a SIGBUS?
>
> I guess getting a userfault is strictly more useful for userspace, but
> I'm not sure which choice is more correct.
Me neither :)
We can deliver userfault, but just block UFFDIO_COPY, can't we?
> > +static int kvm_gmem_filemap_add(struct folio *folio,
> > + struct vm_area_struct *vma,
> > + unsigned long addr)
> > +{
> > + struct inode *inode = file_inode(vma->vm_file);
> > + struct address_space *mapping = inode->i_mapping;
> > + pgoff_t pgoff = linear_page_index(vma, addr);
> > + int err;
> > +
> > + __folio_set_locked(folio);
> > + err = filemap_add_folio(mapping, folio, pgoff, GFP_KERNEL);
>
> This is going to get more interesting with in-place conversion. I'm
> not really sure how to synchronize with it, but we'll probably need to
> take the invalidate lock for reading. And then we'll need a separate
> uffd_op to drop it after we install the PTE... I think.
I think we can start simple and then move on along with the in-place
conversion work. If there will be a need for a new uffd_ops callback we can
add it then.
--
Sincerely yours,
Mike.