[PATCH v2] mm: page_alloc: dump migrate-failed pages

From: Minchan Kim
Date: Mon Mar 08 2021 - 15:21:53 EST


alloc_contig_range is usually used on cma area or movable zone.
It's critical if the page migration fails on those areas so
dump more debugging message.

page refcount, mapcount with page flags on dump_page are
helpful information to deduce the culprit. Furthermore,
dump_page_owner was super helpful to find long term pinner
who initiated the page allocation.

Admin could enable the dump like this(by default, disabled)

echo "func dump_migrate_failure_pages +p" > control

Admin could disable it.

echo "func dump_migrate_failure_pages =_" > control

Signed-off-by: Minchan Kim <minchan@xxxxxxxxxx>
---
* from v1 - https://lore.kernel.org/linux-mm/20210217163603.429062-1-minchan@xxxxxxxxxx/
* use dynamic debugging with system wide instead of per-call site - mhocko

mm/page_alloc.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e4b29ee2b1e..bb0aeca2069c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8453,6 +8453,34 @@ static unsigned long pfn_max_align_up(unsigned long pfn)
pageblock_nr_pages));
}

+#if defined(CONFIG_DYNAMIC_DEBUG) || \
+ (defined(CONFIG_DYNAMIC_DEBUG_CORE) && defined(DYNAMIC_DEBUG_MODULE))
+static DEFINE_RATELIMIT_STATE(alloc_contig_ratelimit_state,
+ DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST);
+int alloc_contig_ratelimit(void)
+{
+ return __ratelimit(&alloc_contig_ratelimit_state);
+}
+
+void dump_migrate_failure_pages(struct list_head *page_list)
+{
+ DEFINE_DYNAMIC_DEBUG_METADATA(descriptor,
+ "migrate failure");
+ if (DYNAMIC_DEBUG_BRANCH(descriptor) &&
+ alloc_contig_ratelimit()) {
+ struct page *page;
+
+ WARN(1, "failed callstack");
+ list_for_each_entry(page, page_list, lru)
+ dump_page(page, "migration failure");
+ }
+}
+#else
+static inline void dump_migrate_failure_pages(struct list_head *page_list)
+{
+}
+#endif
+
/* [start, end) must belong to a single zone. */
static int __alloc_contig_migrate_range(struct compact_control *cc,
unsigned long start, unsigned long end)
@@ -8496,6 +8524,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc,
NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE);
}
if (ret < 0) {
+ dump_migrate_failure_pages(&cc->migratepages);
putback_movable_pages(&cc->migratepages);
return ret;
}
--
2.30.1.766.gb4fecdf3b7-goog