Re: [PATCH] Revert "mm: skip CMA pages when they are not available"

From: Usama Arif
Date: Thu Aug 22 2024 - 11:14:35 EST




On 22/08/2024 06:43, Johannes Weiner wrote:
> On Wed, Aug 21, 2024 at 03:53:21PM -0400, Usama Arif wrote:
>> From 1aae7f04a5cb203ea2c3ede7973dd9eddbbd7a8b Mon Sep 17 00:00:00 2001
>> From: Usama Arif <usamaarif642@xxxxxxxxx>
>> Date: Wed, 21 Aug 2024 20:26:07 +0100
>> Subject: [PATCH] Revert "mm: skip CMA pages when they are not available"
>>
>> This reverts commit 5da226dbfce3a2f44978c2c7cf88166e69a6788b.
>>
>> lruvec->lru_lock is highly contended and is held when calling
>> isolate_lru_folios. If the lru has a large number of CMA folios
>> consecutively, while the allocation type requested is not
>> MIGRATE_MOVABLE, isolate_lru_folios can hold the lock for a very long
>> time while it skips those. For FIO workload, ~150million order=0
>> folios were skipped to isolate a few ZONE_DMA folios [1].
>> This can cause lockups [1] and high memory pressure for extended periods
>> of time [2].
>>
>> [1] https://lore.kernel.org/all/CAOUHufbkhMZYz20aM_3rHZ3OcK4m2puji2FGpUpn_-DevGk3Kg@xxxxxxxxxxxxxx/
>> [2] https://lore.kernel.org/all/ZrssOrcJIDy8hacI@xxxxxxxxx/
>>
>> Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>
>
> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx>
>
> I think this is the right move for now, until there is a robust
> solution for the original issue.
>
> But hould b7108d66318abf3e060c7839eabcba52e9461568 be reverted along
> with it? From its changelog:
>
> No observable issue without this patch on MGLRU, but logically it make
> sense to skip the CMA page reclaim when those pages can't be satisfied for
> the current allocation context.
>
> Presumably it has the same risk reward profile as it does on
> conventional reclaim, with long skip runs while holding the
> lruvec->lock.

Yes makes sense to remove it from there a well, Just doing it in a single commit below: