Re: [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs
From: Dev Jain
Date: Wed Mar 11 2026 - 00:56:32 EST
On 10/03/26 2:04 pm, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 10, 2026 at 01:00:09PM +0530, Dev Jain wrote:
>> The ptes are all the same w.r.t belonging to the same type of VMA, and
>> being marked with uffd-wp or all being not marked. Therefore we can batch
>> set uffd-wp markers through install_uffd_wp_ptes_if_needed, and enable
>> batched unmapping of folios belonging to uffd-wp VMAs by dropping that
>> condition from folio_unmap_pte_batch.
>>
>> It may happen that we don't batch over the entire folio in one go, in which
>> case, we must skip over the current batch. Add a helper to do that -
>> page_vma_mapped_walk_jump() will increment the relevant fields of pvmw
>> by nr pages.
>>
>> I think that we can get away with just incrementing pvmw->pte
>> and pvmw->address, since looking at the code in page_vma_mapped.c,
>> pvmw->pfn and pvmw->nr_pages are used in conjunction, and pvmw->pgoff
>> and pvmw->nr_pages (in vma_address_end()) are used in conjunction,
>> cancelling out the increment and decrement in the respective fields. But
>> let us not rely on the pvmw implementation and keep this simple.
>
> This isn't simple...
>
>>
>> Export this function to rmap.h to enable future reuse.
>>
>> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
>> ---
>> include/linux/rmap.h | 10 ++++++++++
>> mm/rmap.c | 8 +++-----
>> 2 files changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index 8dc0871e5f001..1b7720c66ac87 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -892,6 +892,16 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
>> spin_unlock(pvmw->ptl);
>> }
>>
>> +static inline void page_vma_mapped_walk_jump(struct page_vma_mapped_walk *pvmw,
>> + unsigned int nr)
>
> unsigned long nr_pages... 'nr' is meaningless and you're mixing + matching types
> for no reason.
>
>> +{
>> + pvmw->pfn += nr;
>> + pvmw->nr_pages -= nr;
>> + pvmw->pgoff += nr;
>> + pvmw->pte += nr;
>> + pvmw->address += nr * PAGE_SIZE;
>> +}
>
> I absolutely hate this. It's extremely confusing, especially since you're now
> going from looking at 1 page to nr_pages - 1, jump doesn't really mean anything
> here, you're losing sight of the batch size and exposing a silly detail to the
> caller, and I really don't want to 'export' this at this time.
>
> If we must have this, can you please make it static in rmap.c at least for the
> time being.
>
> Or perhaps instead, have a batched variant of page_vma_mapped_walk(), like
> page_vma_mapped_walk_batch()?
>
> I think that makes a lot more sense...
>
> I mean I kind of hate the pvmw interface in general, this is a hack to handle
> batching clamped on to the side of it, let's figure out how to do this sensibly
> and do what's needed rather than adding yet more hacks-on-hacks please.
>
>> +
>> /**
>> * page_vma_mapped_walk_restart - Restart the page table walk.
>> * @pvmw: Pointer to struct page_vma_mapped_walk.
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index a7570cd037344..dd638429c963e 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -1953,9 +1953,6 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
>> if (pte_unused(pte))
>> return 1;
>>
>> - if (userfaultfd_wp(vma))
>> - return 1;
>> -
>> /*
>> * If unmap fails, we need to restore the ptes. To avoid accidentally
>> * upgrading write permissions for ptes that were not originally
>> @@ -2235,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> * we may want to replace a none pte with a marker pte if
>> * it's file-backed, so we don't lose the tracking info.
>> */
>> - install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, 1);
>> + install_uffd_wp_ptes_if_needed(vma, address, pvmw.pte, pteval, nr_pages);
>>
>> /* Update high watermark before we lower rss */
>> update_hiwater_rss(mm);
>> @@ -2359,8 +2356,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>> * If we are sure that we batched the entire folio and cleared
>> * all PTEs, we can just optimize and stop right here.
>> */
>> - if (nr_pages == folio_nr_pages(folio))
>> + if (likely(nr_pages == folio_nr_pages(folio)))
>
> Please don't add random likely()'s based on what you think is likely(). This
> kind of thing should only be done based on profiling.
Okay.
>
>> goto walk_done;
>> + page_vma_mapped_walk_jump(&pvmw, nr_pages - 1);
>
> (You're now passing a signed long to an unsigned int...!)
Will fix all instances of nr_pages to unsigned long.
>
>
>> continue;
>> walk_abort:
>> ret = false;
>> --
>> 2.34.1
>>
>
> Thanks, Lorenzo