Re: [PATCH] mm/memory: refactor finish_fault

From: Lorenzo Stoakes

Date: Tue Jun 30 2026 - 06:01:47 EST

On Tue, Jun 30, 2026 at 03:51:53PM +0800, Lance Yang wrote:
>
> On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote:
> >finish_fault() currently has a goto fallback implementation
> >where we try to map a large folio with PTEs. If that cannot be
> >installed, we goto fallback and go through the fallback mapping
> >path again. This looks weird and is tough to comprehend.
> >
> >Remove the goto fallback implementation and try to map the
> >whole folio if allowed. If the whole folio cannot be mapped,
> >fall back to single page mapping without repeating the whole
> >function.
> >
> >The cleanup of finish_fault() was suggested by David in [1].
> >
> >[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@xxxxxxxxxx/
> >
> >Suggested-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
> >Signed-off-by: Sarthak Sharma <sarthak.sharma@xxxxxxx>
> >---
> >Tested this patch by running mm selftests on baseline and patched 7.1
> >kernels. No regressions were observed.
> >
> > mm/memory.c | 78 ++++++++++++++++++++++++++++-------------------------
> > 1 file changed, 41 insertions(+), 37 deletions(-)
> >
> >diff --git a/mm/memory.c b/mm/memory.c
> >index 86a973119bd4..f883b3f3850e 100644
> >--- a/mm/memory.c
> >+++ b/mm/memory.c
> >@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf)
> > vm_fault_t finish_fault(struct vm_fault *vmf)
> > {
> > struct vm_area_struct *vma = vmf->vma;
> >- struct page *page;
> >+ struct page *page, *map_page;
> > struct folio *folio;
> > vm_fault_t ret;
> > bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> > !(vma->vm_flags & VM_SHARED);
> >- int type, nr_pages;
> >- unsigned long addr;
> >+ pte_t *fault_pte, *start_pte;
> >+ int type, nr_pages, nr_ptes;
> >+ unsigned long start_addr;
> > bool needs_fallback = false;
> >
> >-fallback:
> >- addr = vmf->address;
> >-
> > /* Did we COW the page? */
> > if (is_cow)
> > page = vmf->cow_page;
> >@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> > return ret;
> > }
> >
> >- if (!needs_fallback && vma->vm_file) {
> >+ if (vma->vm_file) {
> > struct address_space *mapping = vma->vm_file->f_mapping;
> > pgoff_t file_end;
> >
> >@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> >
> > nr_pages = folio_nr_pages(folio);
> >
> >+ /* Initialize for single page mapping */
> >+ map_page = page;
> >+ start_addr = vmf->address;
> >+ nr_ptes = 1;
> >+
> > /* Using per-page fault to maintain the uffd semantics */
> >- if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) {
> >- nr_pages = 1;
> >- } else if (nr_pages > 1) {
> >- pgoff_t idx = folio_page_idx(folio, page);
> >- /* The page offset of vmf->address within the VMA. */
> >- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
> >+ if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) {
> >+ unsigned long fault_page_idx = folio_page_idx(folio, page);
> >+ unsigned long pages_to_folio_end = nr_pages - fault_page_idx;
> >+
> >+ bool fits_vma = folio_within_vma(folio, vma);
> >+
> > /* The index of the entry in the pagetable for fault page. */
> > pgoff_t pte_off = pte_index(vmf->address);
> >
> >- /*
> >- * Fallback to per-page fault in case the folio size in page
> >- * cache beyond the VMA limits and PMD pagetable limits.
> >- */
> >- if (unlikely(vma_off < idx ||
> >- vma_off + (nr_pages - idx) > vma_pages(vma) ||
> >- pte_off < idx ||
> >- pte_off + (nr_pages - idx) > PTRS_PER_PTE)) {
> >- nr_pages = 1;
>
> TL;DR: nothing looks broken to me here. Just wanted to spelling out what
> this relies on :)
>
> Old code above had four range checks before installing multiple PTEs:
>
> 1) vma_off < idx
> 2) vma_off + (nr_pages - idx) > vma_pages(vma)
> 3) pte_off < idx
> 4) pte_off + (nr_pages - idx) > PTRS_PER_PTE
>
> PTE-table part looks equivalent after refactor. 3) and 4) become:
>
> 3) pte_off >= fault_page_idx
> 4) pte_off + pages_to_folio_end <= PTRS_PER_PTE
>
> where fault_page_idx is old idx, and pages_to_folio_end is old
> nr_pages - idx.
>
> Only VMA part looks different to me. Old code checked same range, based
> on fault address, that it later installed:
>
> [vma_off - idx, vma_off + (nr_pages - idx))
>
> After refactor, folio_within_vma() checks folio range:
>
> [folio->index, folio->index + folio_nr_pages(folio))
>
> while installed range is still based on fault address:
>
> [vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages)
>
> So these line up when:
>
> vmf->pgoff == folio->index + fault_page_idx
>
> Looks fine for page-cache users, since vmf->page comes from
> folio_file_page(folio, vmf->pgoff).

This to me all strongly suggests a comment might be helpful here :)

>
> [...]
>
> Cheers, Lance

Cheers, Lorenzo