On Thu, Oct 31, 2024 at 9:14 AM Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx> wrote:
In retract_page_tables(), we may modify the pmd entry after acquiring the
pml and ptl, so we should also check whether the pmd entry is stable.
Why does taking the PMD lock not guarantee that the PMD entry is stable?
Using pte_offset_map_rw_nolock() + pmd_same() to do it, and then we can
also remove the calling of the pte_lockptr().
Signed-off-by: Qi Zheng <zhengqi.arch@xxxxxxxxxxxxx>
---
mm/khugepaged.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 6f8d46d107b4b..6d76dde64f5fb 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1721,6 +1721,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
spinlock_t *pml;
spinlock_t *ptl;
bool skipped_uffd = false;
+ pte_t *pte;
/*
* Check vma->anon_vma to exclude MAP_PRIVATE mappings that
@@ -1756,11 +1757,25 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
addr, addr + HPAGE_PMD_SIZE);
mmu_notifier_invalidate_range_start(&range);
+ pte = pte_offset_map_rw_nolock(mm, pmd, addr, &pgt_pmd, &ptl);
+ if (!pte) {
+ mmu_notifier_invalidate_range_end(&range);
+ continue;
+ }
+
pml = pmd_lock(mm, pmd);
I don't understand why you're mapping the page table before locking
the PMD. Doesn't that just mean we need more error checking
afterwards?
- ptl = pte_lockptr(mm, pmd);
if (ptl != pml)
spin_lock_nested(ptl, SINGLE_DEPTH_NESTING);
+ if (unlikely(!pmd_same(pgt_pmd, pmdp_get_lockless(pmd)))) {
+ pte_unmap_unlock(pte, ptl);
+ if (ptl != pml)
+ spin_unlock(pml);
+ mmu_notifier_invalidate_range_end(&range);
+ continue;
+ }
+ pte_unmap(pte);
+
/*
* Huge page lock is still held, so normally the page table
* must remain empty; and we have already skipped anon_vma
--
2.20.1