Re: [PATCH v5 5/5] mm: rmap: support batched unmapping for file large folios

From: Baolin Wang

Date: Tue Jan 06 2026 - 21:29:29 EST




On 1/7/26 10:21 AM, Barry Song wrote:
On Wed, Jan 7, 2026 at 2:46 PM Wei Yang <richard.weiyang@xxxxxxxxx> wrote:

On Wed, Jan 07, 2026 at 10:29:25AM +1300, Barry Song wrote:
On Wed, Jan 7, 2026 at 2:22 AM Wei Yang <richard.weiyang@xxxxxxxxx> wrote:

On Fri, Dec 26, 2025 at 02:07:59PM +0800, Baolin Wang wrote:
Similar to folio_referenced_one(), we can apply batched unmapping for file
large folios to optimize the performance of file folios reclamation.

Barry previously implemented batched unmapping for lazyfree anonymous large
folios[1] and did not further optimize anonymous large folios or file-backed
large folios at that stage. As for file-backed large folios, the batched
unmapping support is relatively straightforward, as we only need to clear
the consecutive (present) PTE entries for file-backed large folios.

Performance testing:
Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
75% performance improvement on my Arm64 32-core server (and 50%+ improvement
on my X86 machine) with this patch.

W/o patch:
real 0m1.018s
user 0m0.000s
sys 0m1.018s

W/ patch:
real 0m0.249s
user 0m0.000s
sys 0m0.249s

[1] https://lore.kernel.org/all/20250214093015.51024-4-21cnbao@xxxxxxxxx/T/#u
Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx>
Acked-by: Barry Song <baohua@xxxxxxxxxx>
Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
mm/rmap.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/mm/rmap.c b/mm/rmap.c
index 985ab0b085ba..e1d16003c514 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1863,9 +1863,10 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio,
end_addr = pmd_addr_end(addr, vma->vm_end);
max_nr = (end_addr - addr) >> PAGE_SHIFT;

- /* We only support lazyfree batching for now ... */
- if (!folio_test_anon(folio) || folio_test_swapbacked(folio))
+ /* We only support lazyfree or file folios batching for now ... */
+ if (folio_test_anon(folio) && folio_test_swapbacked(folio))
return 1;
+
if (pte_unused(pte))
return 1;

@@ -2231,7 +2232,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*
* See Documentation/mm/mmu_notifier.rst
*/
- dec_mm_counter(mm, mm_counter_file(folio));
+ add_mm_counter(mm, mm_counter_file(folio), -nr_pages);
}
discard:
if (unlikely(folio_test_hugetlb(folio))) {
--
2.47.3


Hi, Baolin

When reading your patch, I come up one small question.

Current try_to_unmap_one() has following structure:

try_to_unmap_one()
while (page_vma_mapped_walk(&pvmw)) {
nr_pages = folio_unmap_pte_batch()

if (nr_pages = folio_nr_pages(folio))
goto walk_done;
}

I am thinking what if nr_pages > 1 but nr_pages != folio_nr_pages().

If my understanding is correct, page_vma_mapped_walk() would start from
(pvmw->address + PAGE_SIZE) in next iteration, but we have already cleared to
(pvmw->address + nr_pages * PAGE_SIZE), right?

Not sure my understanding is correct, if so do we have some reason not to
skip the cleared range?

I don’t quite understand your question. For nr_pages > 1 but not equal
to nr_pages, page_vma_mapped_walk will skip the nr_pages - 1 PTEs inside.

take a look:

next_pte:
do {
pvmw->address += PAGE_SIZE;
if (pvmw->address >= end)
return not_found(pvmw);
/* Did we cross page table boundary? */
if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
if (pvmw->ptl) {
spin_unlock(pvmw->ptl);
pvmw->ptl = NULL;
}
pte_unmap(pvmw->pte);
pvmw->pte = NULL;
pvmw->flags |= PVMW_PGTABLE_CROSSED;
goto restart;
}
pvmw->pte++;
} while (pte_none(ptep_get(pvmw->pte)));


Yes, we do it in page_vma_mapped_walk() now. Since they are pte_none(), they
will be skipped.

I mean maybe we can skip it in try_to_unmap_one(), for example:

diff --git a/mm/rmap.c b/mm/rmap.c
index 9e5bd4834481..ea1afec7c802 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2250,6 +2250,10 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
*/
if (nr_pages == folio_nr_pages(folio))
goto walk_done;
+ else {
+ pvmw.address += PAGE_SIZE * (nr_pages - 1);
+ pvmw.pte += nr_pages - 1;
+ }
continue;
walk_abort:
ret = false;


I feel this couples the PTE walk iteration with the unmap
operation, which does not seem fine to me. It also appears
to affect only corner cases.

Agree. There may be no performance gains, so I also prefer to leave it as is.