Re: [PATCH 5/9] mm/rmap: batch unmap folios belonging to uffd-wp VMAs
From: Barry Song
Date: Tue Mar 10 2026 - 19:32:45 EST
On Tue, Mar 10, 2026 at 4:34 PM Lorenzo Stoakes (Oracle) <ljs@xxxxxxxxxx> wrote:
>
> On Tue, Mar 10, 2026 at 01:00:09PM +0530, Dev Jain wrote:
> > The ptes are all the same w.r.t belonging to the same type of VMA, and
> > being marked with uffd-wp or all being not marked. Therefore we can batch
> > set uffd-wp markers through install_uffd_wp_ptes_if_needed, and enable
> > batched unmapping of folios belonging to uffd-wp VMAs by dropping that
> > condition from folio_unmap_pte_batch.
> >
> > It may happen that we don't batch over the entire folio in one go, in which
> > case, we must skip over the current batch. Add a helper to do that -
> > page_vma_mapped_walk_jump() will increment the relevant fields of pvmw
> > by nr pages.
> >
> > I think that we can get away with just incrementing pvmw->pte
> > and pvmw->address, since looking at the code in page_vma_mapped.c,
> > pvmw->pfn and pvmw->nr_pages are used in conjunction, and pvmw->pgoff
> > and pvmw->nr_pages (in vma_address_end()) are used in conjunction,
> > cancelling out the increment and decrement in the respective fields. But
> > let us not rely on the pvmw implementation and keep this simple.
>
> This isn't simple...
>
> >
> > Export this function to rmap.h to enable future reuse.
> >
> > Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
> > ---
> > include/linux/rmap.h | 10 ++++++++++
> > mm/rmap.c | 8 +++-----
> > 2 files changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 8dc0871e5f001..1b7720c66ac87 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -892,6 +892,16 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw)
> > spin_unlock(pvmw->ptl);
> > }
> >
> > +static inline void page_vma_mapped_walk_jump(struct page_vma_mapped_walk *pvmw,
> > + unsigned int nr)
>
> unsigned long nr_pages... 'nr' is meaningless and you're mixing + matching types
> for no reason.
>
> > +{
> > + pvmw->pfn += nr;
> > + pvmw->nr_pages -= nr;
> > + pvmw->pgoff += nr;
> > + pvmw->pte += nr;
> > + pvmw->address += nr * PAGE_SIZE;
> > +}
>
> I absolutely hate this. It's extremely confusing, especially since you're now
> going from looking at 1 page to nr_pages - 1, jump doesn't really mean anything
> here, you're losing sight of the batch size and exposing a silly detail to the
> caller, and I really don't want to 'export' this at this time.
I’m fairly sure I raised the same concern when Dev first suggested this,
but somehow it seems my comment was completely overlooked. :-)
>
> If we must have this, can you please make it static in rmap.c at least for the
> time being.
>
> Or perhaps instead, have a batched variant of page_vma_mapped_walk(), like
> page_vma_mapped_walk_batch()?
Right now, for non-anon pages we face the same issues, but
page_vma_mapped_walk() can skip those PTEs once it finds that
nr - 1 PTEs are none.
next_pte:
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
if (pvmw->ptl) {
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
}
pte_unmap(pvmw->pte);
pvmw->pte = NULL;
pvmw->flags |= PVMW_PGTABLE_CROSSED;
goto restart;
}
pvmw->pte++;
} while (pte_none(ptep_get(pvmw->pte)));
The difference now is that swap entries cannot be skipped.
If we're trying to find `page_vma_mapped_walk_batch()`, I suppose
it could be like this?
bool page_vma_mapped_walk_batch(struct page_vma_mapped_walk *pvmw,
unsigned long nr)
{
...
}
static inline bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
{
return page_vma_mapped_walk_batch(pvmw, 1);
}
Thanks
barry