Re: [PATCH v3 6/6] arm64: mm: implement the architecture-specific test_and_clear_young_ptes()

From: David Hildenbrand (Arm)

Date: Mon Mar 09 2026 - 10:45:08 EST


On 3/6/26 07:43, Baolin Wang wrote:
> Implement the Arm64 architecture-specific test_and_clear_young_ptes() to enable
> batched checking of young flags, improving performance during large folio
> reclamation when MGLRU is enabled.
>
> While we're at it, simplify ptep_test_and_clear_young() by calling
> test_and_clear_young_ptes(). Since callers guarantee that PTEs are present
> before calling these functions, we can use pte_cont() to check the CONT_PTE
> flag instead of pte_valid_cont().
>
> Performance testing:
> Enable MGLRU, then allocate 10G clean file-backed folios by mmap() in a memory
> cgroup, and try to reclaim 8G file-backed folios via the memory.reclaim interface.
> I can observe 60%+ performance improvement on my Arm64 32-core server (and about
> 15% improvement on my X86 machine).
>
> W/o patchset:
> real 0m0.470s
> user 0m0.000s
> sys 0m0.470s
>
> W/ patchset:
> real 0m0.180s
> user 0m0.001s
> sys 0m0.179s
>
> Reviewed-by: Rik van Riel <riel@xxxxxxxxxxx>
> Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> ---

Reviewed-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>

--
Cheers,

David