On Tue, May 27, 2025 at 01:20:49PM +0530, Dev Jain wrote:
Use folio_pte_batch() to optimize move_ptes(). On arm64, if the ptesBut you're also making this applicable to non-contpte cases?
are painted with the contig bit, then ptep_get() will iterate through all 16
entries to collect a/d bits. Hence this optimization will result in a 16x
reduction in the number of ptep_get() calls. Next, ptep_get_and_clear()
will eventually call contpte_try_unfold() on every contig block, thus
flushing the TLB for the complete large folio range. Instead, use
get_and_clear_full_ptes() so as to elide TLBIs on each contig block, and only
do them on the starting and ending contig block.
See below, but the commit message shoud clearly point out this is general
for page table split large folios (unless I've missed something of course!
:)
Signed-off-by: Dev Jain <dev.jain@xxxxxxx>I think this comment is fairly useless, it basically spells out the function
---
mm/mremap.c | 40 +++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/mm/mremap.c b/mm/mremap.c
index 0163e02e5aa8..580b41f8d169 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -170,6 +170,24 @@ static pte_t move_soft_dirty_pte(pte_t pte)
return pte;
}
+/* mremap a batch of PTEs mapping the same large folio */
name.
I'd prefer something like 'determine if a PTE contains physically contiguous
entries which map the same large folio'.
+static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long addr,The code is much better however! :)
+ pte_t *ptep, pte_t pte, int max_nr)
+{
+ const fpb_t flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY;
+ struct folio *folio;
+
+ if (max_nr == 1)
+ return 1;
+
+ folio = vm_normal_folio(vma, addr, pte);
+ if (!folio || !folio_test_large(folio))
+ return 1;
+
+ return folio_pte_batch(folio, addr, ptep, pte, max_nr, flags, NULL,
+ NULL, NULL);
+}
+Just to clarify, in the previous revision you said:
static int move_ptes(struct pagetable_move_control *pmc,
unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd)
{
@@ -177,7 +195,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
bool need_clear_uffd_wp = vma_has_uffd_without_event_remap(vma);
struct mm_struct *mm = vma->vm_mm;
pte_t *old_ptep, *new_ptep;
- pte_t pte;
+ pte_t old_pte, pte;
pmd_t dummy_pmdval;
spinlock_t *old_ptl, *new_ptl;
bool force_flush = false;
@@ -185,6 +203,8 @@ static int move_ptes(struct pagetable_move_control *pmc,
unsigned long new_addr = pmc->new_addr;
unsigned long old_end = old_addr + extent;
unsigned long len = old_end - old_addr;
+ int max_nr_ptes;
+ int nr_ptes;
int err = 0;
/*
@@ -236,12 +256,14 @@ static int move_ptes(struct pagetable_move_control *pmc,
flush_tlb_batched_pending(vma->vm_mm);
arch_enter_lazy_mmu_mode();
- for (; old_addr < old_end; old_ptep++, old_addr += PAGE_SIZE,
- new_ptep++, new_addr += PAGE_SIZE) {
- if (pte_none(ptep_get(old_ptep)))
+ for (; old_addr < old_end; old_ptep += nr_ptes, old_addr += nr_ptes * PAGE_SIZE,
+ new_ptep += nr_ptes, new_addr += nr_ptes * PAGE_SIZE) {
+ nr_ptes = 1;
+ max_nr_ptes = (old_end - old_addr) >> PAGE_SHIFT;
+ old_pte = ptep_get(old_ptep);
+ if (pte_none(old_pte))
continue;
- pte = ptep_get_and_clear(mm, old_addr, old_ptep);
/*
* If we are remapping a valid PTE, make sure
* to flush TLB before we drop the PTL for the
@@ -253,8 +275,12 @@ static int move_ptes(struct pagetable_move_control *pmc,
* the TLB entry for the old mapping has been
* flushed.
*/
- if (pte_present(pte))
+ if (pte_present(old_pte)) {
+ nr_ptes = mremap_folio_pte_batch(vma, old_addr, old_ptep,
+ old_pte, max_nr_ptes);
force_flush = true;
+ }
+ pte = get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0);
"Split THPs won't be batched; you can use pte_batch() (from David's refactoring)
and figure the split THP batch out, but then get_and_clear_full_ptes() will be
gathering a/d bits and smearing them across the batch, which will be incorrect."
But... this will be triggered for page table split large folio no?
So is there something wrong here or not?
pte = move_pte(pte, old_addr, new_addr);The code looks much better here after refactoring, however!
pte = move_soft_dirty_pte(pte);
@@ -267,7 +293,7 @@ static int move_ptes(struct pagetable_move_control *pmc,
else if (is_swap_pte(pte))
pte = pte_swp_clear_uffd_wp(pte);
}
- set_pte_at(mm, new_addr, new_ptep, pte);
+ set_ptes(mm, new_addr, new_ptep, pte, nr_ptes);
}
}
--
2.30.2