Re: [RFC PATCH 2/2] mm/migrate: wait for folio refcount during longterm pin migration
From: Alistair Popple
Date: Tue Apr 21 2026 - 01:57:47 EST
On 2026-04-10 at 13:23 +1000, John Hubbard <jhubbard@xxxxxxxxxx> wrote...
> When migrating pages for FOLL_LONGTERM pinning (MR_LONGTERM_PIN), the
> migration can fail with -EAGAIN if the folio has unexpected references.
> These references are often transient (e.g., from GPU operations like
> cuMemset that will complete shortly).
Is there a reason this logic should only apply to FOLL_LONGTERM pinning?
Or could it also apply more generally to any ZONE_MOVABLE page, for which
migration should eventually succeed? Currently that has similar retry logic of
NR_MAX_MIGRATE_PAGES_RETRY times and give up.
We have a similar retry problems in mm/migrate_device.c:migrate_vma_*() so I
could see something similar being potentially useful there.
- Alistair
> Previously, the migration code would retry up to 10 times
> (NR_MAX_MIGRATE_PAGES_RETRY), but this busy-retry approach failed when
> the transient reference holder needed more time than the retry loop
> provides.
>
> Fix this by waiting up to one second for the folio's refcount to drop
> to the expected value before retrying migration. The wait uses
> wait_var_event_timeout() paired with the wake_up_var() calls added to
> folio_put() in the previous commit. If the timeout expires, the
> existing retry loop continues as before. The folio_put_wakeup_key
> static key is enabled for the duration of migrate_pages() so that
> folio_put() only wakes waiters when migration is active.
>
> Signed-off-by: John Hubbard <jhubbard@xxxxxxxxxx>
> ---
> mm/migrate.c | 30 ++++++++++++++++++++++++++++++
> 1 file changed, 30 insertions(+)
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 2c3d489ecf51..a5d9f85aa376 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -47,6 +47,8 @@
> #include <asm/tlbflush.h>
>
> #include <trace/events/migrate.h>
> +#include <linux/jump_label.h>
> +#include <linux/wait_bit.h>
>
> #include "internal.h"
> #include "swap.h"
> @@ -1732,6 +1734,17 @@ static void migrate_folios_move(struct list_head *src_folios,
> *retry += 1;
> *thp_retry += is_thp;
> *nr_retry_pages += nr_pages;
> + /*
> + * For longterm pinning, wait for references
> + * to be released before retrying.
> + */
> + if (reason == MR_LONGTERM_PIN) {
> + int expected = folio_expected_ref_count(folio) + 1;
> +
> + wait_var_event_timeout(&folio->_refcount,
> + folio_ref_count(folio) <= expected,
> + HZ);
> + }
> break;
> case 0:
> stats->nr_succeeded += nr_pages;
> @@ -1941,6 +1954,17 @@ static int migrate_pages_batch(struct list_head *from,
> retry++;
> thp_retry += is_thp;
> nr_retry_pages += nr_pages;
> + /*
> + * For longterm pinning, wait for references
> + * to be released.
> + */
> + if (reason == MR_LONGTERM_PIN) {
> + int expected = folio_expected_ref_count(folio) + 1;
> +
> + wait_var_event_timeout(&folio->_refcount,
> + folio_ref_count(folio) <= expected,
> + HZ);
> + }
> break;
> case 0:
> list_move_tail(&folio->lru, &unmap_folios);
> @@ -2085,6 +2109,9 @@ int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
>
> memset(&stats, 0, sizeof(stats));
>
> + if (reason == MR_LONGTERM_PIN)
> + static_branch_inc(&folio_put_wakeup_key);
> +
> rc_gather = migrate_hugetlbs(from, get_new_folio, put_new_folio, private,
> mode, reason, &stats, &ret_folios);
> if (rc_gather < 0)
> @@ -2137,6 +2164,9 @@ int migrate_pages(struct list_head *from, new_folio_t get_new_folio,
> if (!list_empty(from))
> goto again;
> out:
> + if (reason == MR_LONGTERM_PIN)
> + static_branch_dec(&folio_put_wakeup_key);
> +
> /*
> * Put the permanent failure folio back to migration list, they
> * will be put back to the right list by the caller.
> --
> 2.53.0
>