Re: [PATCH 00/13] mm: PMD-level swap entries for anonymous THPs

From: Usama Arif

Date: Thu Apr 30 2026 - 06:39:24 EST

On 29/04/2026 11:44, Kairui Song wrote:
> On Mon, Apr 27, 2026 at 6:09 PM Usama Arif <usama.arif@xxxxxxxxx> wrote:
>>
>> When reclaim swaps out a PMD-mapped anonymous THP today, the PMD is
>> split into 512 PTE-level swap entries via TTU_SPLIT_HUGE_PMD before
>> unmap.
>>
>> This series introduces a PMD-level swap entry. The huge mapping is
>> preserved across the swap round-trip, and do_huge_pmd_swap_page()
>> resolves the entire 2 MB region in a single fault on swap-in,
>
> Hi Usama,
>
> Thanks for the work!
>
>> no khugepaged involvement is needed. swap_map metadata is identical
>
> swap_map is gone, metadata is still per slot but with PMD sized
> swapout, I think soon we can store a swp_tb entry directly in
> ci->table (make it a union maybe) so the metadata is significantly
> reduced from there too. Better do that later with cluster compaction.
>
>> Core patches:
>> 5. PMD swap entry detection (pmd_is_swap_entry,
>> softleaf_is_valid_pmd_entry) and per-arch pmd_swp_*exclusive
>> helpers (x86/arm64/s390/riscv/loongarch).
>> 6. __split_huge_pmd_locked() learns to split a PMD swap entry
>> into 512 PTE swap entries, used as the fallback when a
>> PMD-order resource is unavailable.
>> 7. Fork: copy_huge_non_present_pmd() duplicates the PMD swap entry
>> in one folio_dup_swap() call, with GFP_KERNEL retry mirroring
>> copy_pte_range().
>> 8. Swapoff: unuse_pmd() reads the whole 2 MB folio and reinstalls
>> the PMD; falls back to PTE-split + unuse_pte_range() on error.
>
> There is a slight conflict with the swap folio allocation unification,
> which should be easy to solve. Just a little head up, check the
> swap_cache_alloc_folio helper here:
> https://lore.kernel.org/linux-mm/20260421-swap-table-p4-v3-4-2f23759a76bc@xxxxxxxxxxx/
>
> We will be able to directly allocate 2M folios using
> swap_cache_alloc_folio(orders = BIT(PMD_ORDER)) in the patch link
> above. Might even help to avoid issues with splitting or raced swapin?

Oh yeah, I like your swapin_alloc_pmd_folio a lot more than
swapin_alloc_pmd_folio.

> The conflict can be solved from either side, I'll update that series to
> disable the forced order 0 fallback and let caller pass in (orders =
> <mTHP order> | BIT(0)) instead.

Yes, that would be great. We dont want order 0 fallback in the 2 cases
where we fail in this series.

Thanks!