Re: [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs

From: Kairui Song

Date: Wed Apr 29 2026 - 06:55:43 EST

On Mon, Apr 27, 2026 at 6:09 PM Usama Arif <usama.arif@xxxxxxxxx> wrote:
>
> When reclaim swaps out a PMD-mapped anonymous THP today, the PMD is
> split into 512 PTE-level swap entries via TTU_SPLIT_HUGE_PMD before
> unmap.
>
> This series introduces a PMD-level swap entry. The huge mapping is
> preserved across the swap round-trip, and do_huge_pmd_swap_page()
> resolves the entire 2 MB region in a single fault on swap-in,

Hi Usama,

Thanks for the work!

> no khugepaged involvement is needed. swap_map metadata is identical

swap_map is gone, metadata is still per slot but with PMD sized
swapout, I think soon we can store a swp_tb entry directly in
ci->table (make it a union maybe) so the metadata is significantly
reduced from there too. Better do that later with cluster compaction.

> Core patches:
> 5. PMD swap entry detection (pmd_is_swap_entry,
> softleaf_is_valid_pmd_entry) and per-arch pmd_swp_*exclusive
> helpers (x86/arm64/s390/riscv/loongarch).
> 6. __split_huge_pmd_locked() learns to split a PMD swap entry
> into 512 PTE swap entries, used as the fallback when a
> PMD-order resource is unavailable.
> 7. Fork: copy_huge_non_present_pmd() duplicates the PMD swap entry
> in one folio_dup_swap() call, with GFP_KERNEL retry mirroring
> copy_pte_range().
> 8. Swapoff: unuse_pmd() reads the whole 2 MB folio and reinstalls
> the PMD; falls back to PTE-split + unuse_pte_range() on error.

There is a slight conflict with the swap folio allocation unification,
which should be easy to solve. Just a little head up, check the
swap_cache_alloc_folio helper here:
https://lore.kernel.org/linux-mm/20260421-swap-table-p4-v3-4-2f23759a76bc@xxxxxxxxxxx/

We will be able to directly allocate 2M folios using
swap_cache_alloc_folio(orders = BIT(PMD_ORDER)) in the patch link
above. Might even help to avoid issues with splitting or raced swapin?

The conflict can be solved from either side, I'll update that series to
disable the forced order 0 fallback and let caller pass in (orders =
<mTHP order> | BIT(0)) instead.