Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space

From: Barry Song (Xiaomi)

Date: Fri Jun 19 2026 - 18:42:38 EST


On Sat, Jun 20, 2026 at 3:18 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> On Fri, Jun 19, 2026 at 6:17 AM Barry Song (Xiaomi) <baohua@xxxxxxxxxx> wrote:
> >
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
> >
> > A simple way to reproduce this is to invoke MADV_PAGEOUT on
> > a system with mTHP enabled but without swap configured.
> >
> >  #define SIZE (16 * 1024 * 1024)
> >  int main(void)
> >  {
> >          char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> >                     MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> >          memset(buf, 1, SIZE);
> >          madvise(buf, SIZE, MADV_PAGEOUT);
> >          munmap(buf, SIZE);
> >          return 0;
> >  }
> >
> > With 16KB mTHP enabled, we observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 1024
> >
> > This patch checks swap space before splitting. If there is
> > no available space, it skips splitting. After the patch, we
> > observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 0
> >
> > Reported-by: Nanzhe Zhao <zhaonanzhe@xxxxxxxxxx>
> > Cc: David Hildenbrand <david@xxxxxxxxxx>
> > Cc: Lorenzo Stoakes <ljs@xxxxxxxxxx>
> > Cc: Zi Yan <ziy@xxxxxxxxxx>
> > Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> > Cc: Liam R. Howlett <liam@xxxxxxxxxxxxx>
> > Cc: Nico Pache <npache@xxxxxxxxxx>
> > Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
> > Cc: Dev Jain <dev.jain@xxxxxxx>
> > Cc: Lance Yang <lance.yang@xxxxxxxxx>
> > Cc: Kairui Song <kasong@xxxxxxxxxxx>
> > Cc: Qi Zheng <qi.zheng@xxxxxxxxx>
> > Cc: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> > Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> > Cc: Yuanchu Xie <yuanchu@xxxxxxxxxx>
> > Cc: Wei Xu <weixugc@xxxxxxxxxx>
> > Signed-off-by: Barry Song (Xiaomi) <baohua@xxxxxxxxxx>
> > ---
> >  mm/vmscan.c | 15 +++++++++++++--
> >  1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 299b5d9e8836..33f84a5fe7ee 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> >         return !nodes_empty(allowed_mask);
> >  }
> >
> > -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > -                                         int nid,
> > +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> >                                           struct scan_control *sc)
> >  {
> >         if (memcg == NULL) {
> > @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> >                         return true;
> >         }
> >
> > +       return false;
> > +}
> > +
> > +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > +                                         int nid,
> > +                                         struct scan_control *sc)
> > +{
> > +       if (__can_reclaim_anon_pages(memcg, sc))
> > +               return true;
> > +
> >         /*
> >          * The page can not be swapped.
> >          *
> > @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >
> >                                 if (!folio_test_large(folio))
> >                                         goto activate_locked_split;
> > +                               if (!__can_reclaim_anon_pages(memcg, sc))
> > +                                       goto activate_locked_split;
> >                                 /* Fallback to swap normal pages */
> >                                 if (split_folio_to_list(folio, folio_list))
> >                                         goto activate_locked;
>
> Hello Barry,
>
> Thanks for raising this issue. I saw a similar issue report in the
> mail list before and was thinking that, perhaps another approach is to

Hi Kairui,

Could you please post the link to your report? I'd like to add
your Reported-by and Closes tags as well.


> let folio_alloc_swap return a more detailed error code, for example:
>
> - 1. the mem_cgroup_try_charge_swap in it failed
> - 2. allocation failed but nr_swap_pages > folio size
> - 3. allocation failed because all devices are full or unusable
> (roughly nr_swap_pages < folio size)
>

folio_alloc_swap() returns error codes such as -EAGAIN,
-EINVAL, and -ENOMEM. For cases 1, 2, and 3, I assume it
would return -ENOMEM?

I assume you mean that we might want folio_alloc_swap() to
return an enum instead?

another approach is that I can return -EAGAIN for those cases
we want to retry swapping-out after splitting folios:

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 78b49b0658ad..62e2c506ccae 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
VM_WARN_ON_ONCE(1);
return -EINVAL;
}
+
+ if (get_nr_swap_pages() < (1 << order))
+ return -ENOMEM;
}

again:
@@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
}

/* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
- if (unlikely(mem_cgroup_try_charge_swap(folio)))
+ if (unlikely(mem_cgroup_try_charge_swap(folio))) {
swap_cache_del_folio(folio);
+ return -ENOMEM;
+ }

if (unlikely(!folio_test_swapcache(folio)))
- return -ENOMEM;
+ return -EAGAIN;

return 0;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..4c4cbd72c013 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
*/
if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
!folio_test_swapcache(folio)) {
+ int ret;
+
if (!(sc->gfp_mask & __GFP_IO))
goto keep_locked;
if (folio_maybe_dma_pinned(folio))
@@ -1275,25 +1277,24 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
split_folio_to_list(folio, folio_list))
goto activate_locked;
}
- if (folio_alloc_swap(folio)) {
- int __maybe_unused order = folio_order(folio);
+ ret = folio_alloc_swap(folio);
+ if (!folio_test_large(folio) || ret != -EAGAIN)
+ goto activate_locked_split;

- if (!folio_test_large(folio))
- goto activate_locked_split;
- /* Fallback to swap normal pages */
- if (split_folio_to_list(folio, folio_list))
- goto activate_locked;
+ /* Fallback to swap normal pages */
+ if (split_folio_to_list(folio, folio_list))
+ goto activate_locked;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (nr_pages >= HPAGE_PMD_NR) {
- count_memcg_folio_events(folio,
+ if (nr_pages >= HPAGE_PMD_NR) {
+ count_memcg_folio_events(folio,
THP_SWPOUT_FALLBACK, 1);
- count_vm_event(THP_SWPOUT_FALLBACK);
- }
-#endif
- count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
- if (folio_alloc_swap(folio))
- goto activate_locked_split;
+ count_vm_event(THP_SWPOUT_FALLBACK);
}
+#endif
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT_FALLBACK);
+ if (folio_alloc_swap(folio))
+ goto activate_locked_split;
+
/*
* Normally the folio will be dirtied in unmap because
* its pte should be dirty. A special case is MADV_FREE

> Only case 2 requires splitting. __can_reclaim_anon_pages also checks
> demote which is not related to swap.

I actually extracted __can_reclaim_anon_pages(), which only
checks swap, whereas can_reclaim_anon_pages() checks both
swap and demotion. :-)

Best Regards
Barry