Re: [PATCH v4 2/3] mm, swap: allow archs to override SWAP_NR_ORDERS via ARCH_MAX_PMD_ORDER

From: IBM

Date: Tue Jun 23 2026 - 03:05:41 EST


Barry Song <baohua@xxxxxxxxxx> writes:

> On Fri, Jun 19, 2026 at 12:41 PM Ritesh Harjani (IBM)
> <ritesh.list@xxxxxxxxx> wrote:
>>
>> SWAP_NR_ORDERS sizes a few small bounded arrays inside THP swap
>> allocator code (nofull/frag cluster lists, percpu_swap_cluster's
>> si/offset arrays, next array for rotational device). This currently
>> expands to PMD_ORDER+1, which only works when PMD_ORDER is a compile
>> time constant.
>>
>> However on architecture like PowerPC Book3S64, PMD_ORDER is a runtime
>> variable which depends upon which MMU is selected (Radix / Hash), so in
>> that case, PMD_ORDER cannot be used to size the static arrays.
>>
>> This patch provides an optional ARCH_MAX_PMD_ORDER (upper-bound)
>> override for such architectures. The memory overhead on enabling this
>> override is negligible. Even if we make SWAP_NR_ORDERS runtime alloc,
>> default slab padding could cause some memory waste. Also we lose the
>> per-cpu cacheline benefits (for percpu_swap_cluster) because it might
>> cost an extra cacheline indirection overhead in swap_alloc_fast() for
>> fetching si[order]/offset[order]. Note that a fully runtime
>> SWAP_NR_ORDERS was considered in previous version but was dropped for
>> this reason [1]
>
> Do we know the maximum PMD size?

ARCH_MAX_PMD_ORDER will be 8 on PowerPC book3s64 with 64K pagesize.
PowerPC Hash MMU with 64K default pagesize supports PMD size of 16MB.

> On arm64 with a 64 KB base page,
> a PMD can be as large as 512 MB:
> https://docs.kernel.org/arch/arm64/hugetlbpage.html
>
> One concern we have is that performing I/O on such a large folio could
> incur significant latency before reclaiming any memory. For this
> reason, on arm64 we initially enabled THP_SWAPOUT only for 4 KB base
> pages:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d0637c505f
>

That's not the case on PowerPC. Max PMD size for Hash will be 16MB.
Also we still need this patch since we can at runtime choose Hash or
Radix MMU. So, the main problem this patch is trying to solve on PowerPC
Book3s64 is enabling this feature w/o impacting any other architecture.
W/O this patch series, we can't enable it, since it gives build errors.

>>
>> [1]: https://lore.kernel.org/linuxppc-dev/pl1zdksc.ritesh.list@xxxxxxxxx/
>>
>
> Best Regards
> Barry

Thanks for the review!

-ritesh