Re: [PATCH v6 4/5] arm64: mm: implement the architecture-specific clear_flush_young_ptes()

From: David Hildenbrand (Arm)

Date: Mon Feb 09 2026 - 10:36:10 EST


On 2/9/26 15:07, Baolin Wang wrote:
Implement the Arm64 architecture-specific clear_flush_young_ptes() to enable
batched checking of young flags and TLB flushing, improving performance during
large folio reclamation.

Performance testing:
Allocate 10G clean file-backed folios by mmap() in a memory cgroup, and try to
reclaim 8G file-backed folios via the memory.reclaim interface. I can observe
33% performance improvement on my Arm64 32-core server (and 10%+ improvement
on my X86 machine). Meanwhile, the hotspot folio_check_references() dropped
from approximately 35% to around 5%.

W/o patchset:
real 0m1.518s
user 0m0.000s
sys 0m1.518s

W/ patchset:
real 0m1.018s
user 0m0.000s
sys 0m1.018s

Reviewed-by: Ryan Roberts <ryan.roberts@xxxxxxx>
Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
arch/arm64/include/asm/pgtable.h | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 3dabf5ea17fa..a17eb8a76788 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1838,6 +1838,17 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
return contpte_clear_flush_young_ptes(vma, addr, ptep, 1);
}
+#define clear_flush_young_ptes clear_flush_young_ptes
+static inline int clear_flush_young_ptes(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ unsigned int nr)
+{
+ if (likely(nr == 1 && !pte_cont(__ptep_get(ptep))))
I guess similar cases where we should never end up with non-present ptes should be updated accordingly.

ptep_test_and_clear_young(), for example, should never be called on non-present ptes.

Reviewed-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>

--
Cheers,

David