Re: [PATCH 3/3] migrate_pages: try migrate in batch asynchronously firstly

From: Baolin Wang
Date: Wed Mar 01 2023 - 06:03:26 EST




On 3/1/2023 2:18 PM, Huang, Ying wrote:
Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx> writes:

On 2/24/2023 10:11 PM, Huang Ying wrote:
When we have locked more than one folios, we cannot wait the lock or
bit (e.g., page lock, buffer head lock, writeback bit) synchronously.
Otherwise deadlock may be triggered. This make it hard to batch the
synchronous migration directly.
This patch re-enables batching synchronous migration via trying to
migrate in batch asynchronously firstly. And any folios that are
failed to be migrated asynchronously will be migrated synchronously
one by one.
Test shows that this can restore the TLB flushing batching
performance
for synchronous migration effectively.
Signed-off-by: "Huang, Ying" <ying.huang@xxxxxxxxx>
Cc: Hugh Dickins <hughd@xxxxxxxxxx>
Cc: "Xu, Pengfei" <pengfei.xu@xxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: Stefan Roesch <shr@xxxxxxxxxxxx>
Cc: Tejun Heo <tj@xxxxxxxxxx>
Cc: Xin Hao <xhao@xxxxxxxxxxxxxxxxx>
Cc: Zi Yan <ziy@xxxxxxxxxx>
Cc: Yang Shi <shy828301@xxxxxxxxx>
Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
Cc: Matthew Wilcox <willy@xxxxxxxxxxxxx>
Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
---
mm/migrate.c | 65 ++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 55 insertions(+), 10 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 91198b487e49..c17ce5ee8d92 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1843,6 +1843,51 @@ static int migrate_pages_batch(struct list_head *from, new_page_t get_new_page,
return rc;
}
+static int migrate_pages_sync(struct list_head *from, new_page_t
get_new_page,
+ free_page_t put_new_page, unsigned long private,
+ enum migrate_mode mode, int reason, struct list_head *ret_folios,
+ struct list_head *split_folios, struct migrate_pages_stats *stats)
+{
+ int rc, nr_failed = 0;
+ LIST_HEAD(folios);
+ struct migrate_pages_stats astats;
+
+ memset(&astats, 0, sizeof(astats));
+ /* Try to migrate in batch with MIGRATE_ASYNC mode firstly */
+ rc = migrate_pages_batch(from, get_new_page, put_new_page, private, MIGRATE_ASYNC,
+ reason, &folios, split_folios, &astats,
+ NR_MAX_MIGRATE_PAGES_RETRY);
+ stats->nr_succeeded += astats.nr_succeeded;
+ stats->nr_thp_succeeded += astats.nr_thp_succeeded;
+ stats->nr_thp_split += astats.nr_thp_split;
+ if (rc < 0) {
+ stats->nr_failed_pages += astats.nr_failed_pages;
+ stats->nr_thp_failed += astats.nr_thp_failed;
+ list_splice_tail(&folios, ret_folios);
+ return rc;
+ }
+ stats->nr_thp_failed += astats.nr_thp_split;
+ nr_failed += astats.nr_thp_split;
+ /*
+ * Fall back to migrate all failed folios one by one synchronously. All
+ * failed folios except split THPs will be retried, so their failure
+ * isn't counted
+ */
+ list_splice_tail_init(&folios, from);
+ while (!list_empty(from)) {
+ list_move(from->next, &folios);
+ rc = migrate_pages_batch(&folios, get_new_page, put_new_page,
+ private, mode, reason, ret_folios,
+ split_folios, stats, NR_MAX_MIGRATE_PAGES_RETRY);
+ list_splice_tail_init(&folios, ret_folios);
+ if (rc < 0)
+ return rc;
+ nr_failed += rc;
+ }
+
+ return nr_failed;
+}
+
/*
* migrate_pages - migrate the folios specified in a list, to the free folios
* supplied as the target for the page migration
@@ -1874,7 +1919,7 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
enum migrate_mode mode, int reason, unsigned int *ret_succeeded)
{
int rc, rc_gather;
- int nr_pages, batch;
+ int nr_pages;
struct folio *folio, *folio2;
LIST_HEAD(folios);
LIST_HEAD(ret_folios);
@@ -1890,10 +1935,6 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
if (rc_gather < 0)
goto out;
- if (mode == MIGRATE_ASYNC)
- batch = NR_MAX_BATCHED_MIGRATION;
- else
- batch = 1;
again:
nr_pages = 0;
list_for_each_entry_safe(folio, folio2, from, lru) {
@@ -1904,16 +1945,20 @@ int migrate_pages(struct list_head *from, new_page_t get_new_page,
}
nr_pages += folio_nr_pages(folio);
- if (nr_pages >= batch)
+ if (nr_pages >= NR_MAX_BATCHED_MIGRATION)
break;
}
- if (nr_pages >= batch)
+ if (nr_pages >= NR_MAX_BATCHED_MIGRATION)
list_cut_before(&folios, from, &folio2->lru);
else
list_splice_init(from, &folios);
- rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
- mode, reason, &ret_folios, &split_folios, &stats,
- NR_MAX_MIGRATE_PAGES_RETRY);
+ if (mode == MIGRATE_ASYNC)
+ rc = migrate_pages_batch(&folios, get_new_page, put_new_page, private,
+ mode, reason, &ret_folios, &split_folios, &stats,
+ NR_MAX_MIGRATE_PAGES_RETRY);
+ else
+ rc = migrate_pages_sync(&folios, get_new_page, put_new_page, private,
+ mode, reason, &ret_folios, &split_folios, &stats);

For split folios, it seems also reasonable to use migrate_pages_sync()
instead of always using fixed MIGRATE_ASYNC mode?

For split folios, we only try to migrate them with minimal effort.
Previously, we decrease the retry number from 10 to 1. Now, I think
that it's reasonable to change the migration mode to MIGRATE_ASYNC to
reduce latency. They have been counted as failure anyway.

Sounds reasonable. Thanks for explanation. Please feel free to add:
Reviewed-by: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>