[PATCH] mm/rmap: update to new mmu_notifier semantic v2

From: JÃrÃme Glisse
Date: Thu Aug 31 2017 - 17:17:27 EST


commit 369ea8242c0fb5239b4ddf0dc568f694bd244de4 upstrea.

Please note that this patch differs from the mainline because we do not
really replace mmu_notifier_invalidate_page by mmu_notifier_invalidate_range
because that requires changes to most of existing mmu notifiers. We also
do not want to change the semantic of this API in old kernels. Anyway
Jerome has suggested that it should be sufficient to simply wrap
mmu_notifier_invalidate_page by *_invalidate_range_start()/end() to fix
invalidation of larger than pte mappings (e.g. THP/hugetlb pages during
migration). We need this change to handle large (hugetlb/THP) pages
migration properly.

Note that because we can not presume the pmd value or pte value we have
to assume the worst and unconditionaly report an invalidation as
happening.

Changed since v2:
- try_to_unmap_one() only one call to mmu_notifier_invalidate_range()
- compute end with PAGE_SIZE << compound_order(page)
- fix PageHuge() case in try_to_unmap_one()

Signed-off-by: JÃrÃme Glisse <jglisse@xxxxxxxxxx>
Reviewed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Cc: Bernhard Held <berny156@xxxxxx>
Cc: Adam Borowski <kilobyte@xxxxxxxxxx>
Cc: Radim KrÄmÃÅ <rkrcmar@xxxxxxxxxx>
Cc: Wanpeng Li <kernellwp@xxxxxxxxx>
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Takashi Iwai <tiwai@xxxxxxx>
Cc: Nadav Amit <nadav.amit@xxxxxxxxx>
Cc: Mike Galbraith <efault@xxxxxx>
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: axie <axie@xxxxxxx>
Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> # backport to 4.4
---
mm/rmap.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/mm/rmap.c b/mm/rmap.c
index 1bceb49aa214..364d245e6411 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1324,6 +1324,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
pte_t pteval;
spinlock_t *ptl;
int ret = SWAP_AGAIN;
+ unsigned long start = address, end;
enum ttu_flags flags = (enum ttu_flags)arg;

/* munlock has nothing to gain from examining un-locked vmas */
@@ -1356,6 +1357,14 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
}
}

+ /*
+ * We have to assume the worse case ie pmd for invalidation. Note that
+ * the page can not be free in this function as call of try_to_unmap()
+ * must hold a reference on the page.
+ */
+ end = min(vma->vm_end, start + (PAGE_SIZE << compound_order(page)));
+ mmu_notifier_invalidate_range_start(vma->vm_mm, start, end);
+
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
if (should_defer_flush(mm, flags)) {
@@ -1449,6 +1458,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
pte_unmap_unlock(pte, ptl);
if (ret != SWAP_FAIL && ret != SWAP_MLOCK && !(flags & TTU_MUNLOCK))
mmu_notifier_invalidate_page(mm, address);
+ mmu_notifier_invalidate_range_end(vma->vm_mm, start, end);
out:
return ret;
}
--
2.18.0

--
Michal Hocko
SUSE Labs