Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space

From: Johannes Weiner

Date: Thu Jun 25 2026 - 09:36:41 EST

On Thu, Jun 25, 2026 at 09:49:56AM +0200, David Hildenbrand (Arm) wrote:
> >>
> >> But now I wonder whether we would also want to check "is there any free swap
> >> space", not just "is there any swap".
> >
> > I don't quite understand you. get_nr_swap_pages() returns
> > nr_swap_pages, which increases or decreases as swap is allocated or
> > freed. I guess it just reflects how many swaps we currently have
> > available?
>
> Indeed, I was confused by the function name it's "free swap pages". So all goof :)
>
> >
> >>
> >>
> >> Essentially, try returning -E2BIG if there is the chance to swap out after
> >> split, and -ENOSPC / -ENOMEM if a split wouldn't help.
> >>
> >>> }
> >>>
> >>> again:
> >>> @@ -1769,11 +1772,13 @@ int folio_alloc_swap(struct folio *folio)
> >>> }
> >>>
> >>> /* Need to call this even if allocation failed, for MEMCG_SWAP_FAIL. */
> >>> - if (unlikely(mem_cgroup_try_charge_swap(folio)))
> >>> + if (unlikely(mem_cgroup_try_charge_swap(folio))) {
> >>> swap_cache_del_folio(folio);
> >>> + return -ENOMEM;
> >>
> >> Here we wouldn't have the information whether we could charge after a split.
> >>
> >> So that would require a rework to signal this more cleanly to the caller.
> >
> > Yep. The tricky part is that mem_cgroup_try_charge_swap() cannot
> > return how much swap quota is available in the memcg. Do you prefer to
> > add an output argument to mem_cgroup_try_charge_swap() to expose
> > that
> That would probably be cleanest, if that is easily possible. We would want to
> get memcg maintainer feedback on that.
>
> @memcg folks: we'd like to know whether splitting a large folio would make
> mem_cgroup_try_charge_swap() succeed on a split (smaller) part, to distinguish
> "there is no way we can swap out anything, don't split" vs. "we could swap out,
> split".

It's technically doable, but is this worth the bother? The remaining
headroom is less than a large folio. You can split this one, but you
cannot even swap out all of its subpages anymore? From the cgroup
side, we don't need the limit to be obeyed this rigidly. We overcharge
temporarily in other places if it's convenient to do so. A fuzz factor
around the limit is acceptable.

But if you still want to do it, here is how:

The page_counter_try_charge() in __mem_cgroup_try_charge_swap() walks
the hierarchy upwards. If it fails, it will store the first level that
failed against its limit. You can do the mem_cgroup_margin() math
against this counter to determine headroom. An ancestor *could* be
more restrictive, so you need to finish the hierarchy walk to the root
and use the min() of all the swap.max - page_counter_read(swap). Then
return that in a return argument from __mem_cgroup_try_charge_swap().