Re: [PATCH 3/4] mm/vmscan: Attempt to migrate page in lieu of discard
From: Yang Shi
Date: Thu Oct 17 2019 - 13:30:13 EST
On Wed, Oct 16, 2019 at 3:14 PM Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> wrote:
>
>
> From: Keith Busch <keith.busch@xxxxxxxxx>
>
> If a memory node has a preferred migration path to demote cold pages,
> attempt to move those inactive pages to that migration node before
> reclaiming. This will better utilize available memory, provide a faster
> tier than swapping or discarding, and allow such pages to be reused
> immediately without IO to retrieve the data.
>
> Much like swap, this is an opt-in feature that requires user defining
> where to send pages when reclaiming them. When handling anonymous pages,
> this will be considered before swap if enabled. Should the demotion fail
> for any reason, the page reclaim will proceed as if the demotion feature
> was not enabled.
>
> Some places we would like to see this used:
>
> 1. Persistent memory being as a slower, cheaper DRAM replacement
> 2. Remote memory-only "expansion" NUMA nodes
> 3. Resolving memory imbalances where one NUMA node is seeing more
> allocation activity than another. This helps keep more recent
> allocations closer to the CPUs on the node doing the allocating.
>
> Signed-off-by: Keith Busch <keith.busch@xxxxxxxxx>
> Co-developed-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> ---
>
> b/include/linux/migrate.h | 6 ++++
> b/include/trace/events/migrate.h | 3 +-
> b/mm/debug.c | 1
> b/mm/migrate.c | 51 +++++++++++++++++++++++++++++++++++++++
> b/mm/vmscan.c | 27 ++++++++++++++++++++
> 5 files changed, 87 insertions(+), 1 deletion(-)
>
> diff -puN include/linux/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard include/linux/migrate.h
> --- a/include/linux/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard 2019-10-16 15:06:58.090952593 -0700
> +++ b/include/linux/migrate.h 2019-10-16 15:06:58.103952593 -0700
> @@ -25,6 +25,7 @@ enum migrate_reason {
> MR_MEMPOLICY_MBIND,
> MR_NUMA_MISPLACED,
> MR_CONTIG_RANGE,
> + MR_DEMOTION,
> MR_TYPES
> };
>
> @@ -79,6 +80,7 @@ extern int migrate_huge_page_move_mappin
> extern int migrate_page_move_mapping(struct address_space *mapping,
> struct page *newpage, struct page *page, enum migrate_mode mode,
> int extra_count);
> +extern int migrate_demote_mapping(struct page *page);
> #else
>
> static inline void putback_movable_pages(struct list_head *l) {}
> @@ -105,6 +107,10 @@ static inline int migrate_huge_page_move
> return -ENOSYS;
> }
>
> +static inline int migrate_demote_mapping(struct page *page)
> +{
> + return -ENOSYS;
> +}
> #endif /* CONFIG_MIGRATION */
>
> #ifdef CONFIG_COMPACTION
> diff -puN include/trace/events/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard include/trace/events/migrate.h
> --- a/include/trace/events/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard 2019-10-16 15:06:58.092952593 -0700
> +++ b/include/trace/events/migrate.h 2019-10-16 15:06:58.103952593 -0700
> @@ -20,7 +20,8 @@
> EM( MR_SYSCALL, "syscall_or_cpuset") \
> EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind") \
> EM( MR_NUMA_MISPLACED, "numa_misplaced") \
> - EMe(MR_CONTIG_RANGE, "contig_range")
> + EM( MR_CONTIG_RANGE, "contig_range") \
> + EMe(MR_DEMOTION, "demotion")
>
> /*
> * First define the enums in the above macros to be exported to userspace
> diff -puN mm/debug.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard mm/debug.c
> --- a/mm/debug.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard 2019-10-16 15:06:58.094952593 -0700
> +++ b/mm/debug.c 2019-10-16 15:06:58.103952593 -0700
> @@ -25,6 +25,7 @@ const char *migrate_reason_names[MR_TYPE
> "mempolicy_mbind",
> "numa_misplaced",
> "cma",
> + "demotion",
> };
>
> const struct trace_print_flags pageflag_names[] = {
> diff -puN mm/migrate.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard mm/migrate.c
> --- a/mm/migrate.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard 2019-10-16 15:06:58.097952593 -0700
> +++ b/mm/migrate.c 2019-10-16 15:06:58.104952593 -0700
> @@ -1119,6 +1119,57 @@ out:
> return rc;
> }
>
> +static struct page *alloc_demote_node_page(struct page *page, unsigned long node)
> +{
> + /*
> + * The flags are set to allocate only on the desired node in the
> + * migration path, and to fail fast if not immediately available. We
> + * are already doing memory reclaim, we don't want heroic efforts to
> + * get a page.
> + */
> + gfp_t mask = GFP_NOWAIT | __GFP_NOWARN | __GFP_NORETRY |
> + __GFP_NOMEMALLOC | __GFP_THISNODE | __GFP_MOVABLE;
> + struct page *newpage;
> +
> + if (PageTransHuge(page)) {
> + mask |= __GFP_COMP;
> + newpage = alloc_pages_node(node, mask, HPAGE_PMD_ORDER);
> + if (newpage)
> + prep_transhuge_page(newpage);
> + } else
> + newpage = alloc_pages_node(node, mask, 0);
> +
> + return newpage;
> +}
> +
> +/**
> + * migrate_demote_mapping() - Migrate this page and its mappings to its
> + * demotion node.
> + * @page: A locked, isolated, non-huge page that should migrate to its current
> + * node's demotion target, if available. Since this is intended to be
> + * called during memory reclaim, all flag options are set to fail fast.
> + *
> + * @returns: MIGRATEPAGE_SUCCESS if successful, -errno otherwise.
> + */
> +int migrate_demote_mapping(struct page *page)
> +{
> + int next_nid = next_migration_node(page_to_nid(page));
> +
> + VM_BUG_ON_PAGE(!PageLocked(page), page);
> + VM_BUG_ON_PAGE(PageHuge(page), page);
> + VM_BUG_ON_PAGE(PageLRU(page), page);
> +
> + if (next_nid < 0)
> + return -ENOSYS;
> + if (PageTransHuge(page) && !thp_migration_supported())
> + return -ENOMEM;
> +
> + /* MIGRATE_ASYNC is the most light weight and never blocks.*/
> + return __unmap_and_move(alloc_demote_node_page, NULL, next_nid,
> + page, MIGRATE_ASYNC, MR_DEMOTION);
> +}
> +
> +
> /*
> * gcc 4.7 and 4.8 on arm get an ICEs when inlining unmap_and_move(). Work
> * around it.
> diff -puN mm/vmscan.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard mm/vmscan.c
> --- a/mm/vmscan.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard 2019-10-16 15:06:58.099952593 -0700
> +++ b/mm/vmscan.c 2019-10-16 15:06:58.105952593 -0700
> @@ -1262,6 +1262,33 @@ static unsigned long shrink_page_list(st
> ; /* try to reclaim the page below */
> }
>
> + if (!PageHuge(page)) {
> + int rc = migrate_demote_mapping(page);
> +
> + /*
> + * -ENOMEM on a THP may indicate either migration is
> + * unsupported or there was not enough contiguous
> + * space. Split the THP into base pages and retry the
> + * head immediately. The tail pages will be considered
> + * individually within the current loop's page list.
> + */
> + if (rc == -ENOMEM && PageTransHuge(page) &&
> + !split_huge_page_to_list(page, page_list))
> + rc = migrate_demote_mapping(page);
I recalled when Keith posted the patch at the first time, I raised
question about why not just migrating THP in a whole? The
migrate_pages() could handle this. If it fails, it just fallbacks to
base page.
Since the most optimistic gfp flags are used, it should not trap into
nested direct reclaim. The migrate_pages() should just return failure
then fallback to base page.
> +
> + if (rc == MIGRATEPAGE_SUCCESS) {
> + unlock_page(page);
> + if (likely(put_page_testzero(page)))
> + goto free_it;
> + /*
> + * Speculative reference will free this page,
> + * so leave it off the LRU.
> + */
> + nr_reclaimed++;
> + continue;
> + }
> + }
> +
> /*
> * Anonymous process memory has backing store?
> * Try to allocate it some swap space here.
> _
>