Re: [RFC PATCH] mm: Avoiding split large folios if swap has no space

From: David Hildenbrand (Arm)

Date: Thu Jun 25 2026 - 09:46:11 EST


On 6/25/26 15:36, Johannes Weiner wrote:
> On Thu, Jun 25, 2026 at 09:49:56AM +0200, David Hildenbrand (Arm) wrote:
>>>
>>> I don't quite understand you. get_nr_swap_pages() returns
>>> nr_swap_pages, which increases or decreases as swap is allocated or
>>> freed. I guess it just reflects how many swaps we currently have
>>> available?
>>
>> Indeed, I was confused by the function name it's "free swap pages". So all goof :)
>>
>>>
>>>
>>> Yep. The tricky part is that mem_cgroup_try_charge_swap() cannot
>>> return how much swap quota is available in the memcg. Do you prefer to
>>> add an output argument to mem_cgroup_try_charge_swap() to expose
>>> that
>> That would probably be cleanest, if that is easily possible. We would want to
>> get memcg maintainer feedback on that.
>>
>> @memcg folks: we'd like to know whether splitting a large folio would make
>> mem_cgroup_try_charge_swap() succeed on a split (smaller) part, to distinguish
>> "there is no way we can swap out anything, don't split" vs. "we could swap out,
>> split".
>
> It's technically doable, but is this worth the bother? The remaining
> headroom is less than a large folio. You can split this one, but you
> cannot even swap out all of its subpages anymore?

I was asking myself the same, but when we think in terms of THPs on arm64 64k
we're in the range of double-digit MiBs.

> From the cgroup
> side, we don't need the limit to be obeyed this rigidly. We overcharge
> temporarily in other places if it's convenient to do so. A fuzz factor
> around the limit is acceptable.

Thanks for that information.

>
> But if you still want to do it, here is how:
>
> The page_counter_try_charge() in __mem_cgroup_try_charge_swap() walks
> the hierarchy upwards. If it fails, it will store the first level that
> failed against its limit. You can do the mem_cgroup_margin() math
> against this counter to determine headroom. An ancestor *could* be
> more restrictive, so you need to finish the hierarchy walk to the root
> and use the min() of all the swap.max - page_counter_read(swap). Then
> return that in a return argument from __mem_cgroup_try_charge_swap().

Thanks! @Barry, up to you if we want to implement that right away or if we're
simply going to assume that if charging fails, not worth splitting (changing the
existing handling IIUC).

--
Cheers,

David