Re: [PATCH v2 1/3] arm64: mm: support batch clearing of the young flag for large folios
From: Baolin Wang
Date: Mon Dec 15 2025 - 22:33:08 EST
On 2025/12/15 19:36, Lorenzo Stoakes wrote:
On Thu, Dec 11, 2025 at 04:16:54PM +0800, Baolin Wang wrote:
Currently, contpte_ptep_test_and_clear_young() and contpte_ptep_clear_flush_young()
only clear the young flag and flush TLBs for PTEs within the contiguous range.
To support batch PTE operations for other sized large folios in the following
patches, adding a new parameter to specify the number of PTEs.
While we are at it, rename the functions to maintain consistency with other
contpte_*() functions.
Signed-off-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
---
arch/arm64/include/asm/pgtable.h | 12 ++++-----
arch/arm64/mm/contpte.c | 44 ++++++++++++++++++++++----------
2 files changed, 37 insertions(+), 19 deletions(-)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0944e296dd4a..e03034683156 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1679,10 +1679,10 @@ extern void contpte_clear_full_ptes(struct mm_struct *mm, unsigned long addr,
extern pte_t contpte_get_and_clear_full_ptes(struct mm_struct *mm,
unsigned long addr, pte_t *ptep,
unsigned int nr, int full);
-extern int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
-extern int contpte_ptep_clear_flush_young(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep);
+extern int contpte_test_and_clear_young_ptes(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, unsigned int nr);
+extern int contpte_clear_flush_young_ptes(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep, unsigned int nr);
In core mm at least, as a convention, we strip 'extern' from header declarations
as we go as they're not necessary.
Sure. I can remove the 'extern'.
I wonder about this naming convention also. The contpte_xxx_() prefix
obviously implies we are operating upon PTEs in the contiguous range, now
we are doing something... different and 'nr' is a bit vague.
Is it the nr of contiguous ranges? Well in reality it's nr_pages right? But
Yes, 'nr' here means nr_pages, which follows the convention of the parameters in contpte_xxx_() functions.
now we're not saying they're necessarily contiguous in the sense of contpte...
I wonder whether we'd be better off introducing a new function that
explicitly has 'batch' or 'batched' in the name, and have the usual
contpte_***() stuff and callers that need batching using a new underlying helper?
OK. I get your point. However, this is outside the scope of my patchset.
Perhaps we can clean up all the contpte_xxx_() functions if everyone agrees that this is confusing.
Obviously I defer to Ryan on all this, just my thoughts...
extern void contpte_wrprotect_ptes(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, unsigned int nr);
extern int contpte_ptep_set_access_flags(struct vm_area_struct *vma,
@@ -1854,7 +1854,7 @@ static inline int ptep_test_and_clear_young(struct vm_area_struct *vma,
if (likely(!pte_valid_cont(orig_pte)))
return __ptep_test_and_clear_young(vma, addr, ptep);
- return contpte_ptep_test_and_clear_young(vma, addr, ptep);
+ return contpte_test_and_clear_young_ptes(vma, addr, ptep, CONT_PTES);
}
#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH
@@ -1866,7 +1866,7 @@ static inline int ptep_clear_flush_young(struct vm_area_struct *vma,
if (likely(!pte_valid_cont(orig_pte)))
return __ptep_clear_flush_young(vma, addr, ptep);
- return contpte_ptep_clear_flush_young(vma, addr, ptep);
+ return contpte_clear_flush_young_ptes(vma, addr, ptep, CONT_PTES);
}
#define wrprotect_ptes wrprotect_ptes
diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
index c0557945939c..19b122441be3 100644
--- a/arch/arm64/mm/contpte.c
+++ b/arch/arm64/mm/contpte.c
@@ -488,8 +488,9 @@ pte_t contpte_get_and_clear_full_ptes(struct mm_struct *mm,
}
EXPORT_SYMBOL_GPL(contpte_get_and_clear_full_ptes);
-int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
- unsigned long addr, pte_t *ptep)
+int contpte_test_and_clear_young_ptes(struct vm_area_struct *vma,
+ unsigned long addr, pte_t *ptep,
+ unsigned int nr)
{
/*
* ptep_clear_flush_young() technically requires us to clear the access
@@ -500,39 +501,56 @@ int contpte_ptep_test_and_clear_young(struct vm_area_struct *vma,
* having to unfold.
*/
Hmm shouldn't you need to update this comment now 'nr' is a thing? E.g.:
"And since we only create a contig range when the range is covered by a single
folio, we can get away with clearing young for the whole contig range here, so
we avoid having to unfold."
I think the original comments are still clear:
"
/*
* ptep_clear_flush_young() technically requires us to clear the access
* flag for a _single_ pte. However, the core-mm code actually tracks
* access/dirty per folio, not per page. And since we only create a
* contig range when the range is covered by a single folio, we can get
* away with clearing young for the whole contig range here, so we avoid
* having to unfold.
*/
"
However now you're allowing for large folios that are not contpte?
Overall feeling pretty iffy about implementing this this way.
Please see the above comments, which explain why we should do that.
+ unsigned long start = addr;
+ unsigned long end = start + nr * PAGE_SIZE;
So now [addr, addr + nr * PAGE_SIZE) must for sure be within a single PTE
table?
Caller has made sure that (see folio_referenced_one()).
int young = 0;
int i;
- ptep = contpte_align_down(ptep);
- addr = ALIGN_DOWN(addr, CONT_PTE_SIZE);
+ if (pte_cont(__ptep_get(ptep + nr - 1)))
+ end = ALIGN(end, CONT_PTE_SIZE);
OK so I guess for PTE_CONT to be set, it must be aligned to CONT_PTE_SIZE,
with other PTE entries present there not having PTE_CONT set (Ryan - do
correct me if I'm wrong!)
I guess then no worry about running off the end of the PTE table here.
I wonder about using pte_cont_addr_end() here, but I guess you are not
intending to limit to a range here but rather capture the entire contpte
range of the end.
Right.
I don't love that now a function prefixed with contpte_... can have a
condition where part of the input range is _not_ a contpte entry, or is
specified partially...
Like I said above, this is outside the scope of my patchset. If Ryan also thinks we need to do some cleanups for all the contpte_xxx() functions in the contpte.c file, we can send a follow-up patchset to rename all the contpte_xxx() functions to make it clear.
- for (i = 0; i < CONT_PTES; i++, ptep++, addr += PAGE_SIZE)
- young |= __ptep_test_and_clear_young(vma, addr, ptep);
+ if (pte_cont(__ptep_get(ptep))) {
+ start = ALIGN_DOWN(start, CONT_PTE_SIZE);
+ ptep = contpte_align_down(ptep);
+ }
+
+ nr = (end - start) / PAGE_SIZE;
Hm don't love this reuse of input param.
OK. Will use another local variable instead.
+ for (i = 0; i < nr; i++, ptep++, start += PAGE_SIZE)
+ young |= __ptep_test_and_clear_young(vma, start, ptep);
Again, overall find it a bit iffy that we are essentially overloading the
code with non-contpte behaviour.
Let me clarify further: this is the handling logic of the contpte_xxx() functions in the contpte.c file. For example, the function contpte_set_ptes() has a parameter 'nr', which may be greater than CONT_PTES, meaning that it can handle multiple CONT_PTES range sizes of a large folio.
It might be a bit confusing, and I understand this is why you want to change the 'contpte_' prefix to the 'batch_' prefix. Of course, we can hope Ryan can provide some input on this.
Thanks for reviewing.