Re: [PATCH v6 1/5] mm: rmap: support batched checks of the references for large folios
From: Barry Song
Date: Fri Mar 06 2026 - 16:07:45 EST
On Mon, Feb 9, 2026 at 10:07 PM Baolin Wang
<baolin.wang@xxxxxxxxxxxxxxxxx> wrote:
>
> Currently, folio_referenced_one() always checks the young flag for each PTE
> sequentially, which is inefficient for large folios. This inefficiency is
> especially noticeable when reclaiming clean file-backed large folios, where
> folio_referenced() is observed as a significant performance hotspot.
>
> Moreover, on Arm64 architecture, which supports contiguous PTEs, there is already
> an optimization to clear the young flags for PTEs within a contiguous range.
> However, this is not sufficient. We can extend this to perform batched operations
> for the entire large folio (which might exceed the contiguous range: CONT_PTE_SIZE).
>
> Introduce a new API: clear_flush_young_ptes() to facilitate batched checking
> of the young flags and flushing TLB entries, thereby improving performance
> during large folio reclamation. And it will be overridden by the architecture
> that implements a more efficient batch operation in the following patches.
>
> While we are at it, rename ptep_clear_flush_young_notify() to
> clear_flush_young_ptes_notify() to indicate that this is a batch operation.
>
> Reviewed-by: Harry Yoo <harry.yoo@xxxxxxxxxx>
> Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx>
> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
LGTM,
Reviewed-by: Barry Song <baohua@xxxxxxxxxx>
> ---
> include/linux/mmu_notifier.h | 9 +++++----
> include/linux/pgtable.h | 35 +++++++++++++++++++++++++++++++++++
> mm/rmap.c | 28 +++++++++++++++++++++++++---
> 3 files changed, 65 insertions(+), 7 deletions(-)
>
> diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
> index d1094c2d5fb6..07a2bbaf86e9 100644
> --- a/include/linux/mmu_notifier.h
> +++ b/include/linux/mmu_notifier.h
> @@ -515,16 +515,17 @@ static inline void mmu_notifier_range_init_owner(
> range->owner = owner;
> }
>
> -#define ptep_clear_flush_young_notify(__vma, __address, __ptep) \
> +#define clear_flush_young_ptes_notify(__vma, __address, __ptep, __nr) \
> ({ \
> int __young; \
> struct vm_area_struct *___vma = __vma; \
> unsigned long ___address = __address; \
> - __young = ptep_clear_flush_young(___vma, ___address, __ptep); \
> + unsigned int ___nr = __nr; \
> + __young = clear_flush_young_ptes(___vma, ___address, __ptep, ___nr); \
> __young |= mmu_notifier_clear_flush_young(___vma->vm_mm, \
> ___address, \
> ___address + \
> - PAGE_SIZE); \
> + ___nr * PAGE_SIZE); \
> __young; \
> })
>
> @@ -650,7 +651,7 @@ static inline void mmu_notifier_subscriptions_destroy(struct mm_struct *mm)
>
> #define mmu_notifier_range_update_to_read_only(r) false
>
> -#define ptep_clear_flush_young_notify ptep_clear_flush_young
> +#define clear_flush_young_ptes_notify clear_flush_young_ptes
> #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
> #define ptep_clear_young_notify ptep_test_and_clear_young
> #define pmdp_clear_young_notify pmdp_test_and_clear_young
> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
> index 21b67d937555..a50df42a893f 100644
> --- a/include/linux/pgtable.h
> +++ b/include/linux/pgtable.h
> @@ -1068,6 +1068,41 @@ static inline void wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
> }
> #endif
>
> +#ifndef clear_flush_young_ptes
> +/**
> + * clear_flush_young_ptes - Mark PTEs that map consecutive pages of the same
> + * folio as old and flush the TLB.
> + * @vma: The virtual memory area the pages are mapped into.
> + * @addr: Address the first page is mapped at.
> + * @ptep: Page table pointer for the first entry.
> + * @nr: Number of entries to clear access bit.
> + *
> + * May be overridden by the architecture; otherwise, implemented as a simple
> + * loop over ptep_clear_flush_young().
> + *
> + * Note that PTE bits in the PTE range besides the PFN can differ. For example,
> + * some PTEs might be write-protected.
> + *
> + * Context: The caller holds the page table lock. The PTEs map consecutive
> + * pages that belong to the same folio. The PTEs are all in the same PMD.
> + */
> +static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep, unsigned int nr)
> +{
> + int young = 0;
> +
> + for (;;) {
> + young |= ptep_clear_flush_young(vma, addr, ptep);
> + if (--nr == 0)
> + break;
> + ptep++;
> + addr += PAGE_SIZE;
> + }
> +
> + return young;
> +}
> +#endif
We might have an opportunity to batch the TLB synchronization,
using flush_tlb_range() instead of calling flush_tlb_page()
one by one. Not sure the benefit would be significant though,
especially if only one entry among nr has the young bit set.
Best Regards
Barry