[PATCH] mm/khugepaged: Detecting uffd-wp vma more efficiently

From: Peter Xu
Date: Wed Sep 22 2021 - 13:52:06 EST


We forbid merging thps for uffd-wp enabled regions, by breaking the khugepaged
scanning right after we detected a uffd-wp armed pte (either present, or swap).

It works, but it's less efficient, because those ptes only exist for VM_UFFD_WP
enabled VMAs. Checking against the vma flag would be more efficient, and good
enough. To be explicit, we could still be able to merge some thps for
VM_UFFD_WP regions before this patch as long as they have zero uffd-wp armed
ptes, however that's not a major target for thp collapse anyways.

This mostly reverts commit e1e267c7928fe387e5e1cffeafb0de2d0473663a, but
instead we do the same check at vma level, so it's not a bugfix.

This also paves the way for file-backed uffd-wp support, as the VM_UFFD_WP flag
will work for file-backed too.

After this patch, the error for khugepaged for these regions will switch from
SCAN_PTE_UFFD_WP to SCAN_VMA_CHECK.

Since uffd minor mode should not allow thp as well, do the same thing for minor
mode to stop early on trying to collapse pages in khugepaged.

Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Nadav Amit <nadav.amit@xxxxxxxxx>
Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---

Axel: as I asked in the other thread, please help check whether minor mode will
work properly with shmem thp enabled. If not, I feel like this patch could be
part of that effort at last, but it's also possible that I missed something.

Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
include/trace/events/huge_memory.h | 1 -
mm/khugepaged.c | 26 +++-----------------------
2 files changed, 3 insertions(+), 24 deletions(-)

diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 4fdb14a81108..53532f5925c3 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -15,7 +15,6 @@
EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \
EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \
EM( SCAN_PTE_NON_PRESENT, "pte_non_present") \
- EM( SCAN_PTE_UFFD_WP, "pte_uffd_wp") \
EM( SCAN_PAGE_RO, "no_writable_page") \
EM( SCAN_LACK_REFERENCED_PAGE, "lack_referenced_page") \
EM( SCAN_PAGE_NULL, "page_null") \
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 045cc579f724..3afe66d48db0 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -31,7 +31,6 @@ enum scan_result {
SCAN_EXCEED_SWAP_PTE,
SCAN_EXCEED_SHARED_PTE,
SCAN_PTE_NON_PRESENT,
- SCAN_PTE_UFFD_WP,
SCAN_PAGE_RO,
SCAN_LACK_REFERENCED_PAGE,
SCAN_PAGE_NULL,
@@ -467,6 +466,9 @@ static bool hugepage_vma_check(struct vm_area_struct *vma,
return false;
if (vma_is_temporary_stack(vma))
return false;
+ /* Don't allow thp merging for wp/minor enabled uffd regions */
+ if (userfaultfd_wp(vma) || userfaultfd_minor(vma))
+ return false;
return !(vm_flags & VM_NO_KHUGEPAGED);
}

@@ -1246,15 +1248,6 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
pte_t pteval = *_pte;
if (is_swap_pte(pteval)) {
if (++unmapped <= khugepaged_max_ptes_swap) {
- /*
- * Always be strict with uffd-wp
- * enabled swap entries. Please see
- * comment below for pte_uffd_wp().
- */
- if (pte_swp_uffd_wp(pteval)) {
- result = SCAN_PTE_UFFD_WP;
- goto out_unmap;
- }
continue;
} else {
result = SCAN_EXCEED_SWAP_PTE;
@@ -1270,19 +1263,6 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
goto out_unmap;
}
}
- if (pte_uffd_wp(pteval)) {
- /*
- * Don't collapse the page if any of the small
- * PTEs are armed with uffd write protection.
- * Here we can also mark the new huge pmd as
- * write protected if any of the small ones is
- * marked but that could bring unknown
- * userfault messages that falls outside of
- * the registered range. So, just be simple.
- */
- result = SCAN_PTE_UFFD_WP;
- goto out_unmap;
- }
if (pte_write(pteval))
writable = true;

--
2.31.1