[PATCH] mm, thp: fix race between split huge page and insert into anon_vma tree

From: Kirill A. Shutemov
Date: Thu Apr 10 2014 - 08:41:04 EST


__split_huge_page() has assumption that iteration over anon_vma will
catch all VMAs the page belongs to. The assumption relies on new VMA to
be added to the tail of VMA list, so list_for_each_entry() can catch
them.

Commit bf181b9f9d8d has replaced same_anon_vma linked list with an
interval tree and, I believe, it breaks the assumption.

Let's retry walk over huge anon VMA tree if number of VMA we found
doesn't match with page_mapcount().

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Michel Lespinasse <walken@xxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
---
mm/huge_memory.c | 28 +++++++++++++++-------------
1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 64635f5278ff..6d868a13ca3c 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1807,6 +1807,7 @@ static void __split_huge_page(struct page *page,
BUG_ON(PageTail(page));

mapcount = 0;
+retry:
anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff, pgoff) {
struct vm_area_struct *vma = avc->vma;
unsigned long addr = vma_address(page, vma);
@@ -1814,19 +1815,14 @@ static void __split_huge_page(struct page *page,
mapcount += __split_huge_page_splitting(page, vma, addr);
}
/*
- * It is critical that new vmas are added to the tail of the
- * anon_vma list. This guarantes that if copy_huge_pmd() runs
- * and establishes a child pmd before
- * __split_huge_page_splitting() freezes the parent pmd (so if
- * we fail to prevent copy_huge_pmd() from running until the
- * whole __split_huge_page() is complete), we will still see
- * the newly established pmd of the child later during the
- * walk, to be able to set it as pmd_trans_splitting too.
+ * There's chance that iteration over interval tree will race with
+ * insert to it. Let's try catch new entries by retrying.
*/
- if (mapcount != page_mapcount(page))
- printk(KERN_ERR "mapcount %d page_mapcount %d\n",
+ if (mapcount != page_mapcount(page)) {
+ printk(KERN_DEBUG "mapcount %d page_mapcount %d\n",
mapcount, page_mapcount(page));
- BUG_ON(mapcount != page_mapcount(page));
+ goto retry;
+ }

__split_huge_page_refcount(page, list);

@@ -1837,10 +1833,16 @@ static void __split_huge_page(struct page *page,
BUG_ON(is_vma_temporary_stack(vma));
mapcount2 += __split_huge_page_map(page, vma, addr);
}
- if (mapcount != mapcount2)
+ /*
+ * By the time __split_huge_page_refcount() called all PMDs should be
+ * marked pmd_trans_splitting() and new mappings of the page shouldn't
+ * be created or removed. If number of mappings is changed it's a BUG().
+ */
+ if (mapcount != mapcount2) {
printk(KERN_ERR "mapcount %d mapcount2 %d page_mapcount %d\n",
mapcount, mapcount2, page_mapcount(page));
- BUG_ON(mapcount != mapcount2);
+ BUG();
+ }
}

/*
--
Kirill A. Shutemov
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/