Re: [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs
From: Lorenzo Stoakes (Oracle)
Date: Tue Mar 10 2026 - 04:34:44 EST
On Tue, Mar 10, 2026 at 01:00:09PM +0530, Dev Jain wrote:
> The ptes are all the same w.r.t belonging to the same type of VMA, and
> being marked with uffd-wp or all being not marked. Therefore we can batch
> set uffd-wp markers through install_uffd_wp_ptes_if_needed, and enable
> batched unmapping of folios belonging to uffd-wp VMAs by dropping that
> condition from folio_unmap_pte_batch.
>
> It may happen that we don't batch over the entire folio in one go, in which
> case, we must skip over the current batch. Add a helper to do that -
> page_vma_mapped_walk_jump() will increment the relevant fields of pvmw
> by nr pages.
>
> I think that we can get away with just incrementing pvmw->pte
> and pvmw->address, since looking at the code in page_vma_mapped.c,
> pvmw->pfn and pvmw->nr_pages are used in conjunction, and pvmw->pgoff
> and pvmw->nr_pages (in vma_address_end()) are used in conjunction,
> cancelling out the increment and decrement in the respective fields. But
> let us not rely on the pvmw implementation and keep this simple.
This isn't simple...
>
> Export this function to rmap.h to enable future reuse.
>
> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
> ---
> include/linux/rmap.h | 10 ++++++++++
> mm/rmap.c | 8 +++-----
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 8dc0871e5f001..1b7720c66ac87 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -892,6 +892,16 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
> spin_unlock(pvmw->ptl);
> }
>
> +static inline void page_vma_mapped_walk_jump(struct page_vma_mapped_walk *pvmw,
> + unsigned int nr)
unsigned long nr_pages... 'nr' is meaningless and you're mixing + matching types
for no reason.
> +{
> + pvmw->pfn += nr;
> + pvmw->nr_pages -= nr;
> + pvmw->pgoff += nr;
> + pvmw->pte += nr;
> + pvmw->address += nr * PAGE_SIZE;
> +}
I absolutely hate this. It's extremely confusing, especially since you're now
going from looking at 1 page to nr_pages - 1, jump doesn't really mean anything
here, you're losing sight of the batch size and exposing a silly detail to the
caller, and I really don't want to 'export' this at this time.
If we must have this, can you please make it static in rmap.c at least for the
time being.
Or perhaps instead, have a batched variant of page_vma_mapped_walk(), like
page_vma_mapped_walk_batch()?
I think that makes a lot more sense...
I mean I kind of hate the pvmw interface in general, this is a hack to handle
batching clamped on to the side of it, let's figure out how to do this sensibly
and do what's needed rather than adding yet more hacks-on-hacks please.
> +
> /**
> * page_vma_mapped_walk_restart - Restart the page table walk.
> * @pvmw: Pointer to struct page_vma_mapped_walk.
> diff --git a/mm/rmap.c b/mm/rmap.c
> index a7570cd037344..dd638429c963e 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1953,9 +1953,6 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
> if (pte_unused(pte))
> return 1;
>
> - if (userfaultfd_wp(vma))
> - return 1;
> -
> /*
> * If unmap fails, we need to restore the ptes. To avoid accidentally
> * upgrading write permissions for ptes that were not originally
> @@ -2235,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> * we may want to replace a none pte with a marker pte if
> * it's file-backed, so we don't lose the tracking info.
> */
> - install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, 1);
> + install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, nr_pages);
>
> /* Update high watermark before we lower rss */
> update_hiwater_rss(mm);
> @@ -2359,8 +2356,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> * If we are sure that we batched the entire folio and cleared
> * all PTEs, we can just optimize and stop right here.
> */
> - if (nr_pages == folio_nr_pages(folio))
> + if (likely(nr_pages == folio_nr_pages(folio)))
Please don't add random likely()'s based on what you think is likely(). This
kind of thing should only be done based on profiling.
> goto walk_done;
> + page_vma_mapped_walk_jump(&pvmw, nr_pages - 1);
(You're now passing a signed long to an unsigned int...!)
> continue;
> walk_abort:
> ret = false;
> --
> 2.34.1
>
Thanks, Lorenzo