Re: [RFC PATCH] mm/hmm: Add userfaultfd support to fault handling

From: Stanislav Kinsburskii

Date: Thu Apr 30 2026 - 13:38:36 EST

On Wed, Apr 22, 2026 at 08:14:24PM +0200, David Hildenbrand (Arm) wrote:
> On 4/22/26 16:59, Stanislav Kinsburskii wrote:
> > On Wed, Apr 22, 2026 at 09:27:44AM +0200, David Hildenbrand (Arm) wrote:
> >> On 4/1/26 00:24, Stanislav Kinsburskii wrote:
> >>> Add support for userfaultfd-enabled VMAs to the HMM framework.
> >>>
> >>> Extract fault handling logic into hmm_handle_mm_fault() to handle both
> >>> regular and userfaultfd-backed mappings. The implementation follows
> >>> fixup_user_fault() for handling VM_FAULT_RETRY and VM_FAULT_COMPLETED, with
> >>> a key difference: instead of retrying or moving forward respectively,
> >>> return -EBUSY after reacquiring mmap_read_lock. Since the lock was
> >>> released, the VMA could have changed, so defer retry logic to the caller.
> >>>
> >>> This approach is inefficient for userfaultfd-backed VMAs, as HMM can only
> >>> populate one page at a time, but keeps the framework simple by avoiding
> >>> complex retry logic within HMM itself.
> >>>
> >>> Signed-off-by: Stanislav Kinsburskii <skinsburskii@xxxxxxxxxxxxxxxxxxx>
> >>> ---
> >>> mm/hmm.c | 40 ++++++++++++++++++++++++++++++++++++----
> >>> 1 file changed, 36 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/mm/hmm.c b/mm/hmm.c
> >>> index f6c4ddff4bd6..d04d68e21473 100644
> >>> --- a/mm/hmm.c
> >>> +++ b/mm/hmm.c
> >>> @@ -59,6 +59,35 @@ static int hmm_pfns_fill(unsigned long addr, unsigned long end,
> >>> return 0;
> >>> }
> >>>
> >>> +static int hmm_handle_mm_fault(struct vm_area_struct *vma,
> >>> + unsigned long addr,
> >>> + unsigned int fault_flags)
> >>> +{
> >>> + int ret;
> >>> +
> >>> + if (userfaultfd_missing(vma)) {
> >>> + struct mm_struct *mm = vma->vm_mm;
> >>> +
> >>> + fault_flags |= FAULT_FLAG_ALLOW_RETRY |
> >>> + FAULT_FLAG_USER;
> >>> +
> >>> + ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> >>> +
> >>> + if (ret & (VM_FAULT_COMPLETED | VM_FAULT_RETRY)) {
> >>> + mmap_read_lock(mm);
> >>> + return -EBUSY;
> >>> + }
> >>> +
> >>> + if (ret & VM_FAULT_ERROR)
> >>> + return vm_fault_to_errno(ret, 0);
> >>> + } else {
> >>> + ret = handle_mm_fault(vma, addr, fault_flags, NULL);
> >>> + if (ret & VM_FAULT_ERROR)
> >>> + return vm_fault_to_errno(ret, 0);
> >>> + }
> >>
> >> I'm surprised that there is userfaultfd_missing() logic required here at
> >> all.
> >>
> >> What prevents us from always calling handle_mm_fault() in a way +
> >> handling return values, such that we will just do the right thing
> >> independent of userfaultfd_missing()?
> >>
> >
> > Essentially, the main issue with the current HMM framework is that
> > handle_mm_fault() with FAULT_FLAG_ALLOW_RETRY (needed for userfaultfd)
> > can drop the mm lock. The framework does not expect or support that yet.
> > That is why I had to add the branching.
> >
> > The worst part is that handle_mm_fault() can unlock via
> > mm_read_unlock(), while svm_range_restore_work() takes the mm write
> > lock and this RFC patch doesn't address this problem.
> >
> > The main question where I’d like feedback is: should userfaultfd be
> > supported in the HMM framework at all, given the complexity around
> > unlocking and retry handling (similar to other gup helpers) or should
> > this complexity be offloaded to the caller side?
> >
> > If it should be suported, then I see two ways to implement this:
> >
> > 1. Introduce a new HMM_NEED_USER_FAULT flag to tell the framework that
> > the caller needs userfaultfd support, and extend the framework
> > accordingly. This is not intrusive to existing code. But if we want lazy
> > migration for GPU state in the future (and we likely do), GPU drivers
> > will have to be updated to set this flag later. The only real user of
> > this feature right now is the MSHV driver and I'd extend it to set
> > this flag right away and leave GPU HMM users for later.
> >
> > 2. Add retry logic to hmm_handle_mm_fault() to handle VM_FAULT_RETRY and
> > VM_FAULT_COMPLETED, as required for userfaultfd support. This would be
> > transparent to users. But it would require significant changes: mm
> > relocking, VMA lookup, and other parts. Also, before doing that, the
> > kfd_svm driver would need to be changed to downgrade the mm lock to
> > read.
>
> Exactly this. If we want this, it should be done the proper way, by teaching
> calling code that the lock was dropped etc.
>
> There should not be userfaultfd special-casing anywhere, it's just a matter of
> teaching the code that we might get the mmap lock dropped.
>
> Now, that might be a bit of work :)
>
> What we do in mm/gup.c, is letting callers opt-in whether they can handle
> getting the mmap lock dropped -- not opting in to userfaultfd.
>
> See FOLL_UNLOCKABLE. Such an approach enables a step-wise implementation.
> But note that FOLL_UNLOCKABLE is an internal flag. We set it automatically when
> someone passes us a "locked" variable, indicating that they can deal with it.
>
>
> See populate_vma_page_range() as one example.
>

Okay. So, here is how I underestand what you propose: I should introduce
a new hmm helper function, hmm_range_fault_unlockable(), that would
accept an additional argument, "locked", to indicate whether the caller
can handle the mm lock being dropped. This function would implement the
retry logic for VM_FAULT_RETRY and VM_FAULT_COMPLETED, and the existing
hmm_range_fault() would call hmm_range_fault_unlockable() with
locked=false, while the MSHV driver would call it with locked=true. This
way, we can keep the userfaultfd logic within the hmm framework, while
allowing for a step-wise implementation.

Does this sound reasonable to you?

Thanks,
Stanislav

> --
> Cheers,
>
> David