Re: [PATCH] Revert "mm: skip CMA pages when they are not available"

From: Johannes Weiner
Date: Thu Aug 22 2024 - 06:43:41 EST


On Wed, Aug 21, 2024 at 03:53:21PM -0400, Usama Arif wrote:
> From 1aae7f04a5cb203ea2c3ede7973dd9eddbbd7a8b Mon Sep 17 00:00:00 2001
> From: Usama Arif <usamaarif642@xxxxxxxxx>
> Date: Wed, 21 Aug 2024 20:26:07 +0100
> Subject: [PATCH] Revert "mm: skip CMA pages when they are not available"
>
> This reverts commit 5da226dbfce3a2f44978c2c7cf88166e69a6788b.
>
> lruvec->lru_lock is highly contended and is held when calling
> isolate_lru_folios. If the lru has a large number of CMA folios
> consecutively, while the allocation type requested is not
> MIGRATE_MOVABLE, isolate_lru_folios can hold the lock for a very long
> time while it skips those. For FIO workload, ~150million order=0
> folios were skipped to isolate a few ZONE_DMA folios [1].
> This can cause lockups [1] and high memory pressure for extended periods
> of time [2].
>
> [1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@xxxxxxxxxxxxxx/
> [2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@xxxxxxxxx/
>
> Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>

Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>

I think this is the right move for now, until there is a robust
solution for the original issue.

But hould b7108d66318abf3e060c7839eabcba52e9461568 be reverted along
with it? From its changelog:

No observable issue without this patch on MGLRU, but logically it make
sense to skip the CMA page reclaim when those pages can't be satisfied for
the current allocation context.

Presumably it has the same risk reward profile as it does on
conventional reclaim, with long skip runs while holding the
lruvec->lock.