Re: [PATCH V3] mm: compaction: skip memory compaction when there are not enough migratable pages

From: Ge Yang
Date: Mon Jan 13 2025 - 06:28:26 EST




在 2025/1/13 18:05, Barry Song 写道:
On Mon, Jan 13, 2025 at 10:04 PM Ge Yang <yangge1116@xxxxxxx> wrote:



在 2025/1/13 16:47, Barry Song 写道:
On Thu, Jan 9, 2025 at 12:31 AM <yangge1116@xxxxxxx> wrote:

From: yangge <yangge1116@xxxxxxx>

There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
of memory. I have configured 16GB of CMA memory on each NUMA node,
and starting a 32GB virtual machine with device passthrough is
extremely slow, taking almost an hour.

During the start-up of the virtual machine, it will call
pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
Long term GUP cannot allocate memory from CMA area, so a maximum of
16 GB of no-CMA memory on a NUMA node can be used as virtual machine
memory. There is 16GB of free CMA memory on a NUMA node, which is
sufficient to pass the order-0 watermark check, causing the
__compaction_suitable() function to consistently return true.
However, if there aren't enough migratable pages available, performing
memory compaction is also meaningless. Besides checking whether
the order-0 watermark is met, __compaction_suitable() also needs
to determine whether there are sufficient migratable pages available
for memory compaction.

For costly allocations, because __compaction_suitable() always
returns true, __alloc_pages_slowpath() can't exit at the appropriate
place, resulting in excessively long virtual machine startup times.
Call trace:
__alloc_pages_slowpath
if (compact_result == COMPACT_SKIPPED ||
compact_result == COMPACT_DEFERRED)
goto nopage; // should exit __alloc_pages_slowpath() from here

When the 16G of non-CMA memory on a single node is exhausted, we will
fallback to allocating memory on other nodes. In order to quickly
fallback to remote nodes, we should skip memory compaction when
migratable pages are insufficient. After this fix, it only takes a
few tens of seconds to start a 32GB virtual machine with device
passthrough functionality.

Signed-off-by: yangge <yangge1116@xxxxxxx>
---

V3:
- fix build error

V2:
- consider unevictable folios

mm/compaction.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 07bd227..a9f1261 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order,
int highest_zoneidx,
unsigned long wmark_target)
{
+ pg_data_t __maybe_unused *pgdat = zone->zone_pgdat;
+ unsigned long sum, nr_pinned;
unsigned long watermark;
+
+ sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
+ node_page_state(pgdat, NR_INACTIVE_ANON) +
+ node_page_state(pgdat, NR_ACTIVE_FILE) +
+ node_page_state(pgdat, NR_ACTIVE_ANON) +
+ node_page_state(pgdat, NR_UNEVICTABLE);
+
+ nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
+ node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
+

Does the sum of all LRU pages equal non-CMA memory?
I'm quite confused for two reasons:
1. CMA pages can be LRU pages.
2. Free pages might not belong to any LRUs.
NO.

If all the pages in the LRU are pinned, it seems unnecessary to perform
memory compaction, as the migration of pinned pages is unlikely to succeed.
Besides checking whether the order-0 watermark is met,
__compaction_suitable() also needs to determine whether there are
sufficient migratable pages available for memory compaction.

Ok, but I am not convinced that this is a correct patch. If all your
CMA pages are
used by userspace—in other words, they are in LRUs—the sum could become
quite large, and `nr_pinned` might include non-CMA pages. In that case,
`sum - nr_pinned` would also be quite large. The "return false" logic wouldn't
work as intended.

I suspect the issue seems to have disappeared simply because your CMA is
not being used at all.

Part of the CMA has been used. Due to __compaction_suitable() always returning true, it triggers swapping, which evicts the already-used CMA pages to disk, ultimately resulting in only pinned pages remaining in the LRU (Least Recently Used) list.



+ /*
+ * Gup-pinned pages are non-migratable. After subtracting these pages,
+ * we need to check if the remaining pages are sufficient for memory
+ * compaction.
+ */
+ if ((sum - nr_pinned) < (1 << order))
+ return false;
+
/*
* Watermarks for order-0 must be met for compaction to be able to
* isolate free pages for migration targets. This means that the
--
2.7.4




Thanks
Barry