reply: [PATCHv5] mm: skip CMA pages when they are not available
From: 黄朝阳 (Zhaoyang Huang)
Date: Tue Aug 13 2024 - 05:59:28 EST
>
>On Wed, May 31, 2023 at 10:51:01AM +0800, zhaoyang.huang wrote:
>> From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx>
>>
>> This patch fixes unproductive reclaiming of CMA pages by skipping them
>> when they are not available for current context. It is arise from
>> bellowing OOM issue, which caused by large proportion of MIGRATE_CMA
>pages among free pages.
>
>Hello,
>
>I've been looking into a problem with high memory pressure causing OOMs in
>some of our workloads, and it seems that this change may have introduced lock
>contention when there is high memory pressure.
>
>I've collected some metrics for my specific workload that suggest this change
>has increased the lruvec->lru_lock waittime-max by 500x and the
>waittime-avg by 20x.
>
>Experiment
>==========
>
>The experiment involved 100 hosts, each with 64GB of memory and a single
>Xeon 8321HC CPU. The experiment ran for over 80 hours.
>
>Half of the hosts (50) were configured with the patch reverted and lock stat
>enabled, while the other half was run against the upstream version.
>All machines had hugetlb_cma=6G set as a command-line argument.
>
>In this context, "upstream" refers to kernel release 6.9 with some minor
>changes that should not impact the results.
>
>Workload
>========
>
>The workload is a Java based application that fully utilized the memory, in fact,
>the JVM runs with `-Xms50735m -Xmx50735m` arguments.
>
>Results:
>=======
>
>A few values from lockstat:
>
> waittime-max waittime-total waittime-avg
>holdtime-max
>6.9: 242889 15618873933 715
>17485
>6.9-with-revert: 487 688563299 34
>464
>
>The full data could be seen at:
>https://docs.google.com/spreadsheets/d/1Dl-8ImlE4OZrfKjbyWAIWWuQtgD3f
>wEEl9INaZQZ4e8/edit?usp=sharing
>
>Possible causes:
>================
>
>I've been discussing this with colleagues and we're speculating that the high
>contention might be linked to the fact that CMA regions are now being skipped.
>This could potentially extend the duration of the
>isolate_lru_folios() 'while' loop, resulting in increased pressure on the lock.
>
>However, I want to emphasize that I'm not an expert in this area and I am
>simply sharing the data I collected.
Could you please try below patch which could be helpful
https://lore.kernel.org/linux-mm/CAOUHufa7OBtNHKMhfu8wOOE4f0w3b0_2KzzV7-hrc9rVL8e=iw@xxxxxxxxxxxxxx/