Re: [RFC PATCH 0/2] mm/migrate: wait for folio refcount during longterm pin migration

From: Alistair Popple

Date: Tue Apr 21 2026 - 01:52:32 EST

On 2026-04-10 at 13:23 +1000, John Hubbard <jhubbard@xxxxxxxxxx> wrote...
> Hi,
>
> This adds a bounded sleep to migration so that FOLL_LONGTERM pinning can
> wait for transient folio references to drain, instead of failing after a
> fixed number of retries. The wait uses a one-second timeout. An
> alternative approach would be to call wait_var_event_killable() with no
> timeout, but that doesn't match as well with migration's "this will
> probably work" API. In other words, a short sleeping wait is more
> appropriate here.

This is much better than retrying $RANDOM times. It also seems it would provide
a nice definition of what a transient vs. longterm pin is. Any pins longer than
the migration timeout would be longterm.

> When migrating pages for FOLL_LONGTERM pinning, migration can fail with
> -EAGAIN if a folio has unexpected references. These references are often
> transient, but the current retry loop gives up too quickly. This series
> adds wait_var_event_timeout() at the retry points, paired with
> wake_up_var() in folio_put() to wake the sleeper as soon as the refcount
> drops.

Nothing wrong with the above, just a minor nit that I wanted to check
my understanding of. FOLL_LONGTERM causing migration implies this is in
ZONE_MOVABLE, and the aim of ZONE_MOVABLE is that memory is always movable. That
implies any unexpected page references should *always* be transient, not often
transient. At least that's my understanding assuming drivers are behaving.

> The wake_up_var() calls in folio_put() are gated behind a static key,
> disabled by default, so non-migration workloads pay zero cost.
> migrate_pages() enables the key on entry when the reason is
> MR_LONGTERM_PIN, and disables it on exit.
>
> Toggling the key is not free. folio_put() is static inline, so every
> compilation unit that calls it gets its own patch site (roughly 500 in
> vmlinux, plus modules). On x86, jump label patching is batched (256
> sites per batch, 3 IPI rounds per batch), so enabling the key costs
> 6-9 IPI broadcasts, a few hundred microseconds on a large machine.
> That cost is paid twice per migrate_pages() call. Migration itself
> spends several milliseconds per batch on LRU isolation, TLB flushes,
> and page copies. Concurrent longterm-pin migrations after the first
> just do an atomic_inc (no patching).
>
> Matthew Brost offered to performance-test this series [1], as Intel has
> tests that stress migration and good metrics to catch regressions.
>
> [1] https://lore.kernel.org/all/aX+oUorOWPt1xbgw@xxxxxxxxxxxxxxxxxxxxxxxxx/
>
> John Hubbard (2):
> mm: wake up folio refcount waiters on folio_put()
> mm/migrate: wait for folio refcount during longterm pin migration
>
> include/linux/mm.h | 8 ++++++++
> mm/migrate.c | 30 ++++++++++++++++++++++++++++++
> mm/swap.c | 10 +++++++++-
> 3 files changed, 47 insertions(+), 1 deletion(-)
>
>
> base-commit: 9a9c8ce300cd3859cc87b408ef552cd697cc2ab7
> --
> 2.53.0
>