Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space
From: Barry Song (Xiaomi)
Date: Fri Jun 19 2026 - 18:42:38 EST
On Sat, Jun 20, 2026 at 3:18 AM Kairui Song <ryncsn@xxxxxxxxx> wrote:
>
> On Fri, Jun 19, 2026 at 6:17 AM Barry Song (Xiaomi) <baohua@xxxxxxxxxx> wrote:
> >
> > When swap is disabled or exhausted, swap slot allocation
> > may fail during swapout, causing large folios to be split
> > into small folios. The splitting is reasonable when we
> > truly fail to obtain contiguous swap slots, but it is
> > pointless in the no-space case.
> >
> > A simple way to reproduce this is to invoke MADV_PAGEOUT on
> > a system with mTHP enabled but without swap configured.
> >
> > #define SIZE (16 * 1024 * 1024)
> > int main(void)
> > {
> > char *buf = mmap(NULL, SIZE, PROT_READ | PROT_WRITE,
> > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> > memset(buf, 1, SIZE);
> > madvise(buf, SIZE, MADV_PAGEOUT);
> > munmap(buf, SIZE);
> > return 0;
> > }
> >
> > With 16KB mTHP enabled, we observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 1024
> >
> > This patch checks swap space before splitting. If there is
> > no available space, it skips splitting. After the patch, we
> > observe:
> > ~ # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/stats/split
> > 0
> >
> > Reported-by: Nanzhe Zhao <zhaonanzhe@xxxxxxxxxx>
> > Cc: David Hildenbrand <david@xxxxxxxxxx>
> > Cc: Lorenzo Stoakes <ljs@xxxxxxxxxx>
> > Cc: Zi Yan <ziy@xxxxxxxxxx>
> > Cc: Baolin Wang <baolin.wang@xxxxxxxxxxxxxxxxx>
> > Cc: Liam R. Howlett <liam@xxxxxxxxxxxxx>
> > Cc: Nico Pache <npache@xxxxxxxxxx>
> > Cc: Ryan Roberts <ryan.roberts@xxxxxxx>
> > Cc: Dev Jain <dev.jain@xxxxxxx>
> > Cc: Lance Yang <lance.yang@xxxxxxxxx>
> > Cc: Kairui Song <kasong@xxxxxxxxxxx>
> > Cc: Qi Zheng <qi.zheng@xxxxxxxxx>
> > Cc: Shakeel Butt <shakeel.butt@xxxxxxxxx>
> > Cc: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> > Cc: Yuanchu Xie <yuanchu@xxxxxxxxxx>
> > Cc: Wei Xu <weixugc@xxxxxxxxxx>
> > Signed-off-by: Barry Song (Xiaomi) <baohua@xxxxxxxxxx>
> > ---
> > mm/vmscan.c | 15 +++++++++++++--
> > 1 file changed, 13 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 299b5d9e8836..33f84a5fe7ee 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -339,8 +339,7 @@ static bool can_demote(int nid, struct scan_control *sc,
> > return !nodes_empty(allowed_mask);
> > }
> >
> > -static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > - int nid,
> > +static inline bool __can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > struct scan_control *sc)
> > {
> > if (memcg == NULL) {
> > @@ -356,6 +355,16 @@ static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > return true;
> > }
> >
> > + return false;
> > +}
> > +
> > +static inline bool can_reclaim_anon_pages(struct mem_cgroup *memcg,
> > + int nid,
> > + struct scan_control *sc)
> > +{
> > + if (__can_reclaim_anon_pages(memcg, sc))
> > + return true;
> > +
> > /*
> > * The page can not be swapped.
> > *
> > @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >
> > if (!folio_test_large(folio))
> > goto activate_locked_split;
> > + if (!__can_reclaim_anon_pages(memcg, sc))
> > + goto activate_locked_split;
> > /* Fallback to swap normal pages */
> > if (split_folio_to_list(folio, folio_list))
> > goto activate_locked;
>
> Hello Barry,
>
> Thanks for raising this issue. I saw a similar issue report in the
> mail list before and was thinking that, perhaps another approach is to
Hi Kairui,
Could you please post the link to your report? I'd like to add
your Reported-by and Closes tags as well.
> let folio_alloc_swap return a more detailed error code, for example:
>
> - 1. the mem_cgroup_try_charge_swap in it failed
> - 2. allocation failed but nr_swap_pages > folio size
> - 3. allocation failed because all devices are full or unusable
> (roughly nr_swap_pages < folio size)
>
folio_alloc_swap() returns error codes such as -EAGAIN,
-EINVAL, and -ENOMEM. For cases 1, 2, and 3, I assume it
would return -ENOMEM?
I assume you mean that we might want folio_alloc_swap() to
return an enum instead?
another approach is that I can return -EAGAIN for those cases
we want to retry swapping-out after splitting folios:
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 78b49b0658ad..62e2c506ccae 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
VM_WARN_ON_ONCE(1);
return -EINVAL;
}
+
+ if (get_nr_swap_pages() < (1 << order))
+ return -ENOMEM;
}
again:
@@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
}
/* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
- if (unlikely(mem_cgroup_try_charge_swap(folio)))
+ if (unlikely(mem_cgroup_try_charge_swap(folio))) {
swap_cache_del_folio(folio);
+ return -ENOMEM;
+ }
if (unlikely(!folio_test_swapcache(folio)))
- return -ENOMEM;
+ return -EAGAIN;
return 0;
}
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 299b5d9e8836..4c4cbd72c013 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
*/
if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
!folio_test_swapcache(folio)) {
+ int ret;
+
if (!(sc->gfp_mask & __GFP_IO))
goto keep_locked;
if (folio_maybe_dma_pinned(folio))
@@ -1275,25 +1277,24 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
split_folio_to_list(folio, folio_list))
goto activate_locked;
}
- if (folio_alloc_swap(folio)) {
- int __maybe_unused order = folio_order(folio);
+ ret = folio_alloc_swap(folio);
+ if (!folio_test_large(folio) || ret != -EAGAIN)
+ goto activate_locked_split;
- if (!folio_test_large(folio))
- goto activate_locked_split;
- /* Fallback to swap normal pages */
- if (split_folio_to_list(folio, folio_list))
- goto activate_locked;
+ /* Fallback to swap normal pages */
+ if (split_folio_to_list(folio, folio_list))
+ goto activate_locked;
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
- if (nr_pages >= HPAGE_PMD_NR) {
- count_memcg_folio_events(folio,
+ if (nr_pages >= HPAGE_PMD_NR) {
+ count_memcg_folio_events(folio,
THP_SWPOUT_FALLBACK, 1);
- count_vm_event(THP_SWPOUT_FALLBACK);
- }
-#endif
- count_mthp_stat(order, MTHP_STAT_SWPOUT_FALLBACK);
- if (folio_alloc_swap(folio))
- goto activate_locked_split;
+ count_vm_event(THP_SWPOUT_FALLBACK);
}
+#endif
+ count_mthp_stat(folio_order(folio), MTHP_STAT_SWPOUT_FALLBACK);
+ if (folio_alloc_swap(folio))
+ goto activate_locked_split;
+
/*
* Normally the folio will be dirtied in unmap because
* its pte should be dirty. A special case is MADV_FREE
> Only case 2 requires splitting. __can_reclaim_anon_pages also checks
> demote which is not related to swap.
I actually extracted __can_reclaim_anon_pages(), which only
checks swap, whereas can_reclaim_anon_pages() checks both
swap and demotion. :-)
Best Regards
Barry