Re: [PATCH v4 2/3] mm, swap: allow archs to override SWAP_NR_ORDERS via ARCH_MAX_PMD_ORDER

From: Barry Song

Date: Tue Jun 23 2026 - 01:11:23 EST


On Fri, Jun 19, 2026 at 12:41 PM Ritesh Harjani (IBM)
<ritesh.list@xxxxxxxxx> wrote:
>
> SWAP_NR_ORDERS sizes a few small bounded arrays inside THP swap
> allocator code (nofull/frag cluster lists, percpu_swap_cluster's
> si/offset arrays, next array for rotational device). This currently
> expands to PMD_ORDER+1, which only works when PMD_ORDER is a compile
> time constant.
>
> However on architecture like PowerPC Book3S64, PMD_ORDER is a runtime
> variable which depends upon which MMU is selected (Radix / Hash), so in
> that case, PMD_ORDER cannot be used to size the static arrays.
>
> This patch provides an optional ARCH_MAX_PMD_ORDER (upper-bound)
> override for such architectures. The memory overhead on enabling this
> override is negligible. Even if we make SWAP_NR_ORDERS runtime alloc,
> default slab padding could cause some memory waste. Also we lose the
> per-cpu cacheline benefits (for percpu_swap_cluster) because it might
> cost an extra cacheline indirection overhead in swap_alloc_fast() for
> fetching si[order]/offset[order]. Note that a fully runtime
> SWAP_NR_ORDERS was considered in previous version but was dropped for
> this reason [1]

Do we know the maximum PMD size? On arm64 with a 64 KB base page,
a PMD can be as large as 512 MB:
https://docs.kernel.org/arch/arm64/hugetlbpage.html

One concern we have is that performing I/O on such a large folio could
incur significant latency before reclaiming any memory. For this
reason, on arm64 we initially enabled THP_SWAPOUT only for 4 KB base
pages:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0637c505f

>
> [1]: https://lore.kernel.org/linuxppc-dev/pl1zdksc.ritesh.list@xxxxxxxxx/
>

Best Regards
Barry