On 3/18/20 5:12 PM, Kirill A. Shutemov wrote:
On Thu, Mar 19, 2020 at 07:19:42AM +0800, Yang Shi wrote:
When khugepaged collapses anonymous pages, the base pages would be freedHm. Why should it?
via pagevec or free_page_and_swap_cache(). But, the anonymous page may
be added back to LRU, then it might result in the below race:
ÂÂÂÂCPU AÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ CPU B
khugepaged:
ÂÂ unlock page
ÂÂ putback_lru_page
ÂÂÂÂ add to lru
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ page reclaim:
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ isolate this page
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ try_to_unmap
ÂÂ page_remove_rmap <-- corrupt _mapcount
It looks nothing would prevent the pages from isolating by reclaimer.
try_to_unmap() doesn't exclude parallel page unmapping. _mapcount is
protected by ptl. And this particular _mapcount pin is reachable for
reclaim as it's not part of usual page table tree. Basically
try_to_unmap() will never succeeds until we give up the _mapcount on
khugepaged side.
I don't quite get. What does "not part of usual page table tree" means?
How's about try_to_unmap() acquires ptl before khugepaged?
I don't see the issue right away.
The other problem is the page's active or unevictable flag might beSo what?
still set when freeing the page via free_page_and_swap_cache().
The flags may leak to page free path then kernel may complain if DEBUG_VM is set.
The putback_lru_page() would not clear those two flags if the pages are
released via pagevec, it sounds nothing prevents from isolating active
or unevictable pages.Again, why should it? vmscan is equipped to deal with this.
I don't mean vmscan, I mean khugepaged may isolate active and unevictable pages since it just simply walks page table.
However I didn't really run into these problems, just in theory by visualHm? But we do call putback_lru_page() on the way out. I do not follow.
inspection.
And, it also seems unnecessary to have the pages add back to LRU again since
they are about to be freed when reaching this point. So, clearing active
and unevictable flags, unlocking and dropping refcount from isolate
instead of calling putback_lru_page() as what page cache collapse does.
It just calls putback_lru_page() at error path, not success path. Putting pages back to lru on error path definitely makes sense. Here it is the success path.
Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Signed-off-by: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx>
---
 mm/khugepaged.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index b679908..f42fa4e 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -673,7 +673,6 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ src_page = pte_page(pteval);
ÂÂÂÂÂÂÂÂÂÂÂÂÂ copy_user_highpage(page, src_page, address, vma);
ÂÂÂÂÂÂÂÂÂÂÂÂÂ VM_BUG_ON_PAGE(page_mapcount(src_page) != 1, src_page);
-ÂÂÂÂÂÂÂÂÂÂÂ release_pte_page(src_page);
ÂÂÂÂÂÂÂÂÂÂÂÂÂ /*
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * ptl mostly unnecessary, but preempt has to
ÂÂÂÂÂÂÂÂÂÂÂÂÂÂ * be disabled to update the per-cpu stats
@@ -687,6 +686,15 @@ static void __collapse_huge_page_copy(pte_t *pte, struct page *page,
ÂÂÂÂÂÂÂÂÂÂÂÂÂ pte_clear(vma->vm_mm, address, _pte);
ÂÂÂÂÂÂÂÂÂÂÂÂÂ page_remove_rmap(src_page, false);
ÂÂÂÂÂÂÂÂÂÂÂÂÂ spin_unlock(ptl);
+
+ÂÂÂÂÂÂÂÂÂÂÂ dec_node_page_state(src_page,
+ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ NR_ISOLATED_ANON + page_is_file_cache(src_page));
+ÂÂÂÂÂÂÂÂÂÂÂ ClearPageActive(src_page);
+ÂÂÂÂÂÂÂÂÂÂÂ ClearPageUnevictable(src_page);
+ÂÂÂÂÂÂÂÂÂÂÂ unlock_page(src_page);
+ÂÂÂÂÂÂÂÂÂÂÂ /* Drop refcount from isolate */
+ÂÂÂÂÂÂÂÂÂÂÂ put_page(src_page);
+
ÂÂÂÂÂÂÂÂÂÂÂÂÂ free_page_and_swap_cache(src_page);
ÂÂÂÂÂÂÂÂÂ }
ÂÂÂÂÂ }
--
1.8.3.1