Re: [PATCH] mm/memory: refactor finish_fault

From: Lance Yang

Date: Tue Jun 30 2026 - 03:52:57 EST

On Wed, Jun 24, 2026 at 03:50:47PM +0530, Sarthak Sharma wrote:
>finish_fault() currently has a goto fallback implementation
>where we try to map a large folio with PTEs. If that cannot be
>installed, we goto fallback and go through the fallback mapping
>path again. This looks weird and is tough to comprehend.
>
>Remove the goto fallback implementation and try to map the
>whole folio if allowed. If the whole folio cannot be mapped,
>fall back to single page mapping without repeating the whole
>function.
>
>The cleanup of finish_fault() was suggested by David in [1].
>
>[1] https://lore.kernel.org/all/3684c55a-6581-4731-b94a-19526f455a1e@xxxxxxxxxx/
>
>Suggested-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
>Signed-off-by: Sarthak Sharma <sarthak.sharma@xxxxxxx>
>---
>Tested this patch by running mm selftests on baseline and patched 7.1
>kernels. No regressions were observed.
>
> mm/memory.c | 78 ++++++++++++++++++++++++++++-------------------------
> 1 file changed, 41 insertions(+), 37 deletions(-)
>
>diff --git a/mm/memory.c b/mm/memory.c
>index 86a973119bd4..f883b3f3850e 100644
>--- a/mm/memory.c
>+++ b/mm/memory.c
>@@ -5666,18 +5666,16 @@ static bool vmf_pte_changed(struct vm_fault *vmf)
> vm_fault_t finish_fault(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
>- struct page *page;
>+ struct page *page, *map_page;
> struct folio *folio;
> vm_fault_t ret;
> bool is_cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> !(vma->vm_flags & VM_SHARED);
>- int type, nr_pages;
>- unsigned long addr;
>+ pte_t *fault_pte, *start_pte;
>+ int type, nr_pages, nr_ptes;
>+ unsigned long start_addr;
> bool needs_fallback = false;
>
>-fallback:
>- addr = vmf->address;
>-
> /* Did we COW the page? */
> if (is_cow)
> page = vmf->cow_page;
>@@ -5695,7 +5693,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
> return ret;
> }
>
>- if (!needs_fallback && vma->vm_file) {
>+ if (vma->vm_file) {
> struct address_space *mapping = vma->vm_file->f_mapping;
> pgoff_t file_end;
>
>@@ -5727,56 +5725,62 @@ vm_fault_t finish_fault(struct vm_fault *vmf)
>
> nr_pages = folio_nr_pages(folio);
>
>+ /* Initialize for single page mapping */
>+ map_page = page;
>+ start_addr = vmf->address;
>+ nr_ptes = 1;
>+
> /* Using per-page fault to maintain the uffd semantics */
>- if (unlikely(userfaultfd_armed(vma)) || unlikely(needs_fallback)) {
>- nr_pages = 1;
>- } else if (nr_pages > 1) {
>- pgoff_t idx = folio_page_idx(folio, page);
>- /* The page offset of vmf->address within the VMA. */
>- pgoff_t vma_off = vmf->pgoff - vmf->vma->vm_pgoff;
>+ if (nr_pages > 1 && likely(!needs_fallback) && likely(!userfaultfd_armed(vma))) {
>+ unsigned long fault_page_idx = folio_page_idx(folio, page);
>+ unsigned long pages_to_folio_end = nr_pages - fault_page_idx;
>+
>+ bool fits_vma = folio_within_vma(folio, vma);
>+
> /* The index of the entry in the pagetable for fault page. */
> pgoff_t pte_off = pte_index(vmf->address);
>
>- /*
>- * Fallback to per-page fault in case the folio size in page
>- * cache beyond the VMA limits and PMD pagetable limits.
>- */
>- if (unlikely(vma_off < idx ||
>- vma_off + (nr_pages - idx) > vma_pages(vma) ||
>- pte_off < idx ||
>- pte_off + (nr_pages - idx) > PTRS_PER_PTE)) {
>- nr_pages = 1;

TL;DR: nothing looks broken to me here. Just wanted to spelling out what
this relies on :)

Old code above had four range checks before installing multiple PTEs:

1) vma_off < idx
2) vma_off + (nr_pages - idx) > vma_pages(vma)
3) pte_off < idx
4) pte_off + (nr_pages - idx) > PTRS_PER_PTE

PTE-table part looks equivalent after refactor. 3) and 4) become:

3) pte_off >= fault_page_idx
4) pte_off + pages_to_folio_end <= PTRS_PER_PTE

where fault_page_idx is old idx, and pages_to_folio_end is old
nr_pages - idx.

Only VMA part looks different to me. Old code checked same range, based
on fault address, that it later installed:

[vma_off - idx, vma_off + (nr_pages - idx))

After refactor, folio_within_vma() checks folio range:

[folio->index, folio->index + folio_nr_pages(folio))

while installed range is still based on fault address:

[vmf->pgoff - fault_page_idx, vmf->pgoff - fault_page_idx + nr_pages)

So these line up when:

vmf->pgoff == folio->index + fault_page_idx

Looks fine for page-cache users, since vmf->page comes from
folio_file_page(folio, vmf->pgoff).

[...]

Cheers, Lance