[PATCH 7.2 v4 02/12] mm/khugepaged: add folio dirty check after try_to_unmap()
From: Zi Yan
Date: Thu Apr 23 2026 - 22:55:12 EST
This check ensures the correctness of collapse read-only THPs for FSes
after READ_ONLY_THP_FOR_FS is enabled by default for all FSes supporting
PMD THP pagecache.
READ_ONLY_THP_FOR_FS only supports read-only fd and uses mapping->nr_thps
and inode->i_writecount to prevent any write to read-only to-be-collapsed
folios. In upcoming commits, READ_ONLY_THP_FOR_FS will be removed and the
aforementioned mechanism will go away too. To ensure khugepaged functions
as expected after the changes, skip if any folio is dirty after
try_to_unmap(), since a dirty folio means this read-only folio
got some writes via mmap can happen between try_to_unmap() and
try_to_unmap_flush() via cached TLB entries and khugepaged does not support
writable pagecache folio collapse yet.
Signed-off-by: Zi Yan <ziy@xxxxxxxxxx>
---
mm/khugepaged.c | 28 ++++++++++++++++++++++++----
1 file changed, 24 insertions(+), 4 deletions(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 79f051eb6195..726f8ace01af 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2327,8 +2327,7 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
}
} else if (folio_test_dirty(folio)) {
/*
- * khugepaged only works on read-only fd,
- * so this page is dirty because it hasn't
+ * This page is dirty because it hasn't
* been flushed since first write. There
* won't be new dirty pages.
*
@@ -2386,8 +2385,8 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
if (!is_shmem && (folio_test_dirty(folio) ||
folio_test_writeback(folio))) {
/*
- * khugepaged only works on read-only fd, so this
- * folio is dirty because it hasn't been flushed
+ * khugepaged only works on clean file-backed folios,
+ * so this folio is dirty because it hasn't been flushed
* since first write.
*/
result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
@@ -2431,6 +2430,27 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
goto out_unlock;
}
+ /*
+ * At this point, the folio is locked and unmapped. If the PTE
+ * was dirty, try_to_unmap() has transferred the dirty bit to
+ * the folio and we must not collapse it into a clean
+ * file-backed folio.
+ *
+ * If the folio is clean here, no one can write it until we
+ * drop the folio lock. A write through a stale TLB entry came
+ * from a clean PTE and must fault because the PTE has been
+ * cleared; the fault path has to take the folio lock before
+ * installing a writable mapping. Buffered write paths also
+ * have to take the folio lock before modifying file contents
+ * without a mapping, typically via write_begin_get_folio().
+ */
+ if (!is_shmem && folio_test_dirty(folio)) {
+ result = SCAN_PAGE_DIRTY_OR_WRITEBACK;
+ xas_unlock_irq(&xas);
+ folio_putback_lru(folio);
+ goto out_unlock;
+ }
+
/*
* Accumulate the folios that are being collapsed.
*/
--
2.43.0