On Tue, Apr 12, 2022 at 4:07 AM Khalid Aziz <khalid.aziz@xxxxxxxxxx> wrote:
mshare'd PTEs should not be removed when a task exits. These PTEs
are removed when the last task sharing the PTEs exits. Add a check
for shared PTEs and skip them.
Signed-off-by: Khalid Aziz <khalid.aziz@xxxxxxxxxx>
---
mm/memory.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c
index c77c0d643ea8..e7c5bc6f8836 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -419,16 +419,25 @@ void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *vma,
} else {
/*
* Optimization: gather nearby vmas into one call down
+ * as long as they all belong to the same mm (that
+ * may not be the case if a vma is part of mshare'd
+ * range
*/
while (next && next->vm_start <= vma->vm_end + PMD_SIZE
- && !is_vm_hugetlb_page(next)) {
+ && !is_vm_hugetlb_page(next)
+ && vma->vm_mm == tlb->mm) {
vma = next;
next = vma->vm_next;
unlink_anon_vmas(vma);
unlink_file_vma(vma);
}
- free_pgd_range(tlb, addr, vma->vm_end,
- floor, next ? next->vm_start : ceiling);
+ /*
+ * Free pgd only if pgd is not allocated for an
+ * mshare'd range
+ */
+ if (vma->vm_mm == tlb->mm)
+ free_pgd_range(tlb, addr, vma->vm_end,
+ floor, next ? next->vm_start : ceiling);
}
vma = next;
}
@@ -1551,6 +1560,13 @@ void unmap_page_range(struct mmu_gather *tlb,
pgd_t *pgd;
unsigned long next;
+ /*
+ * If this is an mshare'd page, do not unmap it since it might
+ * still be in use.
+ */
+ if (vma->vm_mm != tlb->mm)
+ return;
+
expect unmap, have you ever tested reverse mapping in vmscan, especially
folio_referenced()? are all vmas in those processes sharing page table still
in the rmap of the shared page?
without shared PTE, if 1000 processes share one page, we are reading 1000
PTEs, with it, are we reading just one? or are we reading the same PTE
1000 times? Have you tested it?