[PATCH RESEND] mm: drop lruvec->lru_lock if contended when skipping folio
From: Usama Arif
Date: Mon Aug 19 2024 - 14:49:16 EST
lruvec->lru_lock is highly contended and is held when calling
isolate_lru_folios. If the lru has a large number of CMA folios
consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
isolate_lru_folios can hold the lock for a very long time while it
skips those. vmscan_lru_isolate tracepoint showed that skipped can go
above 70k in production and lockstat shows that waittime-max is x1000
higher without this patch.
This can cause lockups [1] and high memory pressure for extended periods of
time [2]. Hence release the lock if its contended when skipping a folio to
give other tasks a chance to acquire it and not stall.
[1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@xxxxxxxxx/
Suggested-by: Yu Zhao <yuzhao@xxxxxxxxxx>
Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>
Reported-by: Bharata B Rao <bharata@xxxxxxx>
Tested-by: Bharata B Rao <bharata@xxxxxxx>
---
mm/vmscan.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 25e43bb3b574..bf8d39a1ad3e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1695,8 +1695,14 @@ static unsigned long isolate_lru_folios(unsigned long nr_to_scan,
if (folio_zonenum(folio) > sc->reclaim_idx ||
skip_cma(folio, sc)) {
nr_skipped[folio_zonenum(folio)] += nr_pages;
- move_to = &folios_skipped;
- goto move;
+ list_move(&folio->lru, &folios_skipped);
+ if (!spin_is_contended(&lruvec->lru_lock))
+ continue;
+ if (!list_empty(dst))
+ break;
+ spin_unlock_irq(&lruvec->lru_lock);
+ cond_resched();
+ spin_lock_irq(&lruvec->lru_lock);
}
/*
--
2.43.5