Re: [RFC 01/12] mm: add PUD THP ptdesc and rmap support
From: Zi Yan
Date: Mon Feb 02 2026 - 11:08:27 EST
On 2 Feb 2026, at 5:44, Kiryl Shutsemau wrote:
> On Sun, Feb 01, 2026 at 04:50:18PM -0800, Usama Arif wrote:
>> For page table management, PUD THPs need to pre-deposit page tables
>> that will be used when the huge page is later split. When a PUD THP
>> is allocated, we cannot know in advance when or why it might need to
>> be split (COW, partial unmap, reclaim), but we need page tables ready
>> for that eventuality. Similar to how PMD THPs deposit a single PTE
>> table, PUD THPs deposit a PMD table which itself contains deposited
>> PTE tables - a two-level deposit. This commit adds the deposit/withdraw
>> infrastructure and a new pud_huge_pmd field in ptdesc to store the
>> deposited PMD.
>>
>> The deposited PMD tables are stored as a singly-linked stack using only
>> page->lru.next as the link pointer. A doubly-linked list using the
>> standard list_head mechanism would cause memory corruption: list_del()
>> poisons both lru.next (offset 8) and lru.prev (offset 16), but lru.prev
>> overlaps with ptdesc->pmd_huge_pte at offset 16. Since deposited PMD
>> tables have their own deposited PTE tables stored in pmd_huge_pte,
>> poisoning lru.prev would corrupt the PTE table list and cause crashes
>> when withdrawing PTE tables during split. PMD THPs don't have this
>> problem because their deposited PTE tables don't have sub-deposits.
>> Using only lru.next avoids the overlap entirely.
>>
>> For reverse mapping, PUD THPs need the same rmap support that PMD THPs
>> have. The page_vma_mapped_walk() function is extended to recognize and
>> handle PUD-mapped folios during rmap traversal. A new TTU_SPLIT_HUGE_PUD
>> flag tells the unmap path to split PUD THPs before proceeding, since
>> there is no PUD-level migration entry format - the split converts the
>> single PUD mapping into individual PTE mappings that can be migrated
>> or swapped normally.
>>
>> Signed-off-by: Usama Arif <usamaarif642@xxxxxxxxx>
>> ---
>> include/linux/huge_mm.h | 5 +++
>> include/linux/mm.h | 19 ++++++++
>> include/linux/mm_types.h | 5 ++-
>> include/linux/pgtable.h | 8 ++++
>> include/linux/rmap.h | 7 ++-
>> mm/huge_memory.c | 8 ++++
>> mm/internal.h | 3 ++
>> mm/page_vma_mapped.c | 35 +++++++++++++++
>> mm/pgtable-generic.c | 83 ++++++++++++++++++++++++++++++++++
>> mm/rmap.c | 96 +++++++++++++++++++++++++++++++++++++---
>> 10 files changed, 260 insertions(+), 9 deletions(-)
>>
<snip>
>> diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
>> index d3aec7a9926ad..2047558ddcd79 100644
>> --- a/mm/pgtable-generic.c
>> +++ b/mm/pgtable-generic.c
>> @@ -195,6 +195,89 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
>> }
>> #endif
>>
>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>> +/*
>> + * Deposit page tables for PUD THP.
>> + * Called with PUD lock held. Stores PMD tables in a singly-linked stack
>> + * via pud_huge_pmd, using only pmd_page->lru.next as the link pointer.
>> + *
>> + * IMPORTANT: We use only lru.next (offset 8) for linking, NOT the full
>> + * list_head. This is because lru.prev (offset 16) overlaps with
>> + * ptdesc->pmd_huge_pte, which stores the PMD table's deposited PTE tables.
>> + * Using list_del() would corrupt pmd_huge_pte with LIST_POISON2.
>
> This is ugly.
>
> Sounds like you want to use llist_node/head instead of list_head for this.
>
> You might able to avoid taking the lock in some cases. Note that
> pud_lockptr() is mm->page_table_lock as of now.
I agree. I used llist_node/head in my implementation[1] and it works.
I have an illustration at[2] to show the concept. Feel free to reuse the code.
[1] https://lore.kernel.org/all/20200928193428.GB30994@xxxxxxxxxxxxxxxxxxxx/
[2] https://normal.zone/blog/2021-01-04-linux-1gb-thp-2/#new-mechanism
Best Regards,
Yan, Zi