Re: [PATCH RESEND] mm: drop lruvec->lru_lock if contended when skipping folio

From: Bharata B Rao
Date: Tue Aug 20 2024 - 00:39:38 EST


On 20-Aug-24 12:16 AM, Usama Arif wrote:
lruvec->lru_lock is highly contended and is held when calling
isolate_lru_folios. If the lru has a large number of CMA folios
consecutively, while the allocation type requested is not MIGRATE_MOVABLE,
isolate_lru_folios can hold the lock for a very long time while it
skips those. vmscan_lru_isolate tracepoint showed that skipped can go
above 70k in production and lockstat shows that waittime-max is x1000
higher without this patch.
This can cause lockups [1] and high memory pressure for extended periods of
time [2]. Hence release the lock if its contended when skipping a folio to
give other tasks a chance to acquire it and not stall.

[1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@xxxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@xxxxxxxxx/

Though the above link[2] mentions it, can you explicitly include the specific condition that we saw in the patch description?

"isolate_lru_folios() can end up scanning through a huge number of folios with lruvec spinlock held. For FIO workload, ~150million order=0 folios were skipped to isolate a few ZONE_DMA folios."

Regards,
Bharata.