Re: [RFC PATCH 4/5] mm: swap: fall back to order-0 after large swapin races

From: Kairui Song

Date: Mon May 11 2026 - 11:08:20 EST


On Mon, May 11, 2026 at 9:14 PM David Hildenbrand (Arm)
<david@xxxxxxxxxx> wrote:
>
> On 5/8/26 22:20, fujunjie wrote:
> > swapin_folio() documents that a large folio insertion race returns NULL
> > so the caller can fall back to order-0 swapin. do_swap_page() currently
> > turns that NULL into VM_FAULT_OOM if the PTE is unchanged, which is
> > harsher than necessary and gets in the way of rejecting large folio
> > ranges for backend reasons.
> >
> > Move the synchronous swapin sequence into a helper and retry with an
> > order-0 folio when a large folio cannot be inserted into the swap cache.
> > Count the event as an mTHP swapin fallback before dropping the failed
> > large allocation.
> >
> > Signed-off-by: fujunjie <fujunjie1@xxxxxx>
> > ---
> > mm/memory.c | 50 +++++++++++++++++++++++++++++++++++++++-----------
> > 1 file changed, 39 insertions(+), 11 deletions(-)
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index ea6568571131..84e3b77b8293 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4757,6 +4757,44 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf)
> > }
> > #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > +static struct folio *swapin_synchronous_folio(swp_entry_t entry,
> > + struct vm_fault *vmf)
> > +{
> > + struct folio *swapcache, *folio;
> > + bool large;
> > + int order;
> > +
> > + folio = alloc_swap_folio(vmf);
> > + if (!folio)
> > + return NULL;
> > +
> > + large = folio_test_large(folio);
> > + order = folio_order(folio);
> > +
> > + /*
> > + * folio is charged, so swapin can only fail due to raced swapin and
> > + * return NULL.
> > + */
> > + swapcache = swapin_folio(entry, folio);
> > + if (swapcache == folio)
> > + return folio;
> > +
> > + if (!swapcache && large)
> > + count_mthp_stat(order, MTHP_STAT_SWPIN_FALLBACK);
> > + folio_put(folio);
> > + if (swapcache || !large)
> > + return swapcache;
> > +
> > + folio = __alloc_swap_folio(vmf);
> > + if (!folio)
> > + return NULL;
> > +
> > + swapcache = swapin_folio(entry, folio);
> > + if (swapcache != folio)
> > + folio_put(folio);
> > + return swapcache;
> > +}
> > +
> > /* Sanity check that a folio is fully exclusive */
> > static void check_swap_exclusive(struct folio *folio, swp_entry_t entry,
> > unsigned int nr_pages)
> > @@ -4860,17 +4898,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > swap_update_readahead(folio, vma, vmf->address);
> > if (!folio) {
> > if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) {
> > - folio = alloc_swap_folio(vmf);
> > - if (folio) {
> > - /*
> > - * folio is charged, so swapin can only fail due
> > - * to raced swapin and return NULL.
> > - */
> > - swapcache = swapin_folio(entry, folio);
> > - if (swapcache != folio)
> > - folio_put(folio);
> > - folio = swapcache;
> > - }
> > + folio = swapin_synchronous_folio(entry, vmf);
> > } else {
> > folio = swapin_readahead(entry, GFP_HIGHUSER_MOVABLE, vmf);
> > }
>
> There are some upcoming changes with:
>
> https://lore.kernel.org/r/20260421-swap-table-p4-v3-5-2f23759a76bc@xxxxxxxxxxx
>
>
> All the of that logic you have in swapin_synchronous_folio() should ideally not
> go into memory.c, but into some swap specific code.
>
> But
>
> https://lore.kernel.org/r/20260421-swap-table-p4-v3-0-2f23759a76bc@xxxxxxxxxxx

Thanks for mentioning this!

I think Junjie's change fits better after that change indeed. And I
checked the code, it should fits easily too.

It's already strange enough that THP swapin is bundled with
synchronous swapin, we better not make it more divergent here, and add
more bits into memory.c.

And this commit will limit it to anon, no shmem, which is another
strange detail. Or we'll have to repeat everything and copy these code
to shmem.c...

Once all swap-ins uses basically the same path as in that series, all
swap-ins will be able to have similar THP and zswap THP support too.