[PATCH] mm: swap: do not wait for lock_page() in unuse_pte_range()

From: Andrea Righi
Date: Wed Jul 22 2020 - 13:44:43 EST


Waiting for lock_page() with mm->mmap_sem held in unuse_pte_range() can
lead to stalls while running swapoff (i.e., not being able to ssh into
the system, inability to execute simple commands like 'ps', etc.).

Replace lock_page() with trylock_page() and release mm->mmap_sem if we
fail to lock it, giving other tasks a chance to continue and prevent
the stall.

This issue can be easily reproduced running swapoff in systems with a
large amount of RAM (>=100GB) and a lot of pages swapped out to disk. A
specific use case is to run swapoff immediately after resuming from
hibernation.

Under these conditions and without this patch applied the system can be
stalled even for 15min, with this patch applied the system is always
responsive.

Signed-off-by: Andrea Righi <andrea.righi@xxxxxxxxxxxxx>
---
mm/swapfile.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 987276c557d1..794935ecf82a 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1977,7 +1977,11 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
return -ENOMEM;
}

- lock_page(page);
+ if (!trylock_page(page)) {
+ ret = -EAGAIN;
+ put_page(page);
+ goto out;
+ }
wait_on_page_writeback(page);
ret = unuse_pte(vma, pmd, addr, entry, page);
if (ret < 0) {
@@ -2100,11 +2104,17 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type,
struct vm_area_struct *vma;
int ret = 0;

+retry:
mmap_read_lock(mm);
for (vma = mm->mmap; vma; vma = vma->vm_next) {
if (vma->anon_vma) {
ret = unuse_vma(vma, type, frontswap,
fs_pages_to_unuse);
+ if (ret == -EAGAIN) {
+ mmap_read_unlock(mm);
+ cond_resched();
+ goto retry;
+ }
if (ret)
break;
}
--
2.25.1