Re: [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes()
From: David Hildenbrand (Arm)
Date: Fri Mar 06 2026 - 09:47:56 EST
On 3/6/26 07:43, Baolin Wang wrote:
> Implement the Arm64 architecture-specific test_and_clear_young_ptes() to enable
> batched checking of young flags, improving performance during large folio
> reclamation when MGLRU is enabled.
>
> While we're at it, simplify ptep_test_and_clear_young() by calling
> test_and_clear_young_ptes(). Since callers guarantee that PTEs are present
> before calling these functions, we can use pte_cont() to check the CONT_PTE
> flag instead of pte_valid_cont().
>
> Performance testing:
> Enable MGLRU, then allocate 10G clean file-backed folios by mmap() in a memory
> cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface.
> I can observe 60%+ performance improvement on my Arm64 32-core server (and about
> 15% improvement on my X86 machine).
>
> W/o patchset:
> real 0m0.470s
> user 0m0.000s
> sys 0m0.470s
>
> W/ patchset:
> real 0m0.180s
> user 0m0.001s
> sys 0m0.179s
>
> Reviewed-by: Rik van Riel <riel@xxxxxxxxxxx>
> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> ---
> arch/arm64/include/asm/pgtable.h | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index aa4b13da6371..ab451d20e4c5 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -1812,16 +1812,22 @@ static inline pte_t ptep_get_and_clear(struct mm_struct *mm,
> return __ptep_get_and_clear(mm, addr, ptep);
> }
>
> +#define test_and_clear_young_ptes test_and_clear_young_ptes
> +static inline int test_and_clear_young_ptes(struct vm_area_struct *vma,
> + unsigned long addr, pte_t *ptep,
> + unsigned int nr)
> +{
> + if (likely(nr == 1 && !pte_cont(__ptep_get(ptep))))
> + return __ptep_test_and_clear_young(vma, addr, ptep);
> +
> + return contpte_test_and_clear_young_ptes(vma, addr, ptep, nr);
> +}
Thinking out loud, what would happen if
(a) The range spans multiple possible cont ranges (like, 64 ptes).
(b) The first pte is !pte_cont(), but some others in there are?
--
Cheers,
David