Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space

From: Barry Song

Date: Wed Jun 24 2026 - 19:08:23 EST


On Mon, Jun 22, 2026 at 4:58 PM David Hildenbrand (Arm)
<david@xxxxxxxxxx> wrote:
>
> On 6/20/26 10:10, Barry Song (Xiaomi) wrote:
> > On Fri, Jun 19, 2026 at 10:04 PM David Hildenbrand (Arm) <david@xxxxxxxxxx> wrote:
> > [...]
> >>> /*
> >>> * The page can not be swapped.
> >>> *
> >>> @@ -1280,6 +1289,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> >>>
> >>> if (!folio_test_large(folio))
> >>> goto activate_locked_split;
> >>> + if (!__can_reclaim_anon_pages(memcg, sc))
> >>> + goto activate_locked_split;
> >>
> >> Why are we even trying to allocate swap space if we cannot reclaim such pages?
> >> Makes we wonder whether we would want to have that check earlier, before the
> >> folio_alloc_swap().
> >>
> >> Any downsides?
> >
> > I don't think there are any obvious downsides there. One issue is that
> > the memcg may not be passed from reclaim_pages(), so memcg would
> > always be NULL. However, the folio could still belong to a memcg
> > whose swap quota has been exhausted. In that case, my
> > __can_reclaim_anon_pages() will fail when checking whether we can
> > swap out. But switching to folio_memcg() also seems awkward.
> >
> > So I feel Kairui’s suggestion [1] might be the best approach. In
> > folio_alloc_swap(), we return -EAGAIN to tell vmscan.c that
> > we can split the folio and retry the swap-out.
> > only when there are sufficient swap slots and sufficient memcg swap
> > quota do we return -EAGAIN, allowing vmscan to perform a split.
> >
> > diff --git a/mm/swapfile.c b/mm/swapfile.c
> > index 78b49b0658ad..62e2c506ccae 100644
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -1755,6 +1755,9 @@ int folio_alloc_swap(struct folio *folio)
> > VM_WARN_ON_ONCE(1);
> > return -EINVAL;
> > }
> > +
> > + if (get_nr_swap_pages() < (1 << order))
> > + return -ENOMEM;
>
> I guess we could be clearer with the return value:
>
> -> !get_nr_swap_pages() -> -ENOSPC / -ENOMEM
>
> (no space at all)
>
> -> get_nr_swap_pages() < (1 << order) -> -E2BIG
>
> (there is some space, but not for the full thing)

understood. make sense.

>
> But now I wonder whether we would also want to check "is there any free swap
> space", not just "is there any swap".

I don't quite understand you. get_nr_swap_pages() returns
nr_swap_pages, which increases or decreases as swap is allocated or
freed. I guess it just reflects how many swaps we currently have
available?

>
>
> Essentially, try returning -E2BIG if there is the chance to swap out after
> split, and -ENOSPC / -ENOMEM if a split wouldn't help.
>
> > }
> >
> > again:
> > @@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
> > }
> >
> > /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
> > - if (unlikely(mem_cgroup_try_charge_swap(folio)))
> > + if (unlikely(mem_cgroup_try_charge_swap(folio))) {
> > swap_cache_del_folio(folio);
> > + return -ENOMEM;
>
> Here we wouldn't have the information whether we could charge after a split.
>
> So that would require a rework to signal this more cleanly to the caller.

Yep. The tricky part is that mem_cgroup_try_charge_swap() cannot
return how much swap quota is available in the memcg. Do you prefer to
add an output argument to mem_cgroup_try_charge_swap() to expose
that, or should we introduce a separate wrapper such as …

long get_nr_swap_pages_from_folio_memcg(struct folio *folio)
{
long int nr;
memcg = get_memcg_from_folio(folio);
nr = mem_cgroup_get_nr_swap_pages(memcg);

return nr;
}

then in folio_alloc_swap(), if nr < folio_nr_pages() but > 0,
we ask for a split by returning -E2BIG ?

>
> > + }
> >
> > if (unlikely(!folio_test_swapcache(folio)))
> > - return -ENOMEM;
> > + return -EAGAIN;
> >
> > return 0;
> > }
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 299b5d9e8836..63e8578454ea 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1257,6 +1257,8 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> > */
> > if (folio_test_anon(folio) && folio_test_swapbacked(folio) &&
> > !folio_test_swapcache(folio)) {
> > + int ret;
> > +
> > if (!(sc->gfp_mask & __GFP_IO))
> > goto keep_locked;
> > if (folio_maybe_dma_pinned(folio))
> > @@ -1275,10 +1277,10 @@ static unsigned int shrink_folio_list(struct list_head *folio_list,
> > split_folio_to_list(folio, folio_list))
> > goto activate_locked;
> > }
> > - if (folio_alloc_swap(folio)) {
> > + if ((ret = folio_alloc_swap(folio))) {
>
> I prefer doing the assignment outside the conditional.

Ok.

>
> > int __maybe_unused order = folio_order(folio);
> >
> > - if (!folio_test_large(folio))
> > + if (!folio_test_large(folio) || ret != -EAGAIN)
> > goto activate_locked_split;
> > /* Fallback to swap normal pages */
> > if (split_folio_to_list(folio, folio_list))
> >
> > What’s your view on this, David?
>
> I guess returning from folio_alloc_swap() whether a split could allow for
> swapout (e.g., -E2BIG) would be reasonable.
>
> To catch all the cases where it makes a difference:
> * No free swap space (split won't work)
> * Some free swap space (split would work)
> * Sufficient free swap space, but fragmented (split would work)
> * No memcg space (split won't work)
> * Some memcg space (split would work)

Right. The memcg part is the tricky one, since we don’t have that
information available.

Best Regards
Barry