Re: [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb

From: Dev Jain

Date: Tue Jun 30 2026 - 09:59:46 EST




On 30/06/26 6:16 pm, David Hildenbrand (Arm) wrote:
> On 6/30/26 13:34, Dev Jain wrote:
>>
>>
>> On 29/06/26 1:35 pm, David Hildenbrand (Arm) wrote:
>>> On 6/29/26 09:48, Lance Yang wrote:
>>>>
>>>> >from pagewalk code (where some users like pagemap need the actual address).
>>>>
>>>> Indeed ...
>>>>
>>>>
>>>> Kinda lean toward option 1, even if it's more invasive. If we pass the
>>>> hstate down, each arch can figure out the right addr from there.
>>>>
>>>>
>>>> AFAICT, for huge_ptep_get() the addr users are arm64 and powerpc, riscv
>>>> doesn't really care about addr there. Looks mostly arm64-specific ...
>>> powerpc handles it correctly in the weird "span two PMD entries" case by
>>> aligning the PMD down.
>>>
>>> Risc-v copied from arm64, but can simply derive the #entries from the PTE value.
>>> it doesn't have to re-walk the table using the address.
>>>
>>> But I think the following is required to fix, no?
>>
>> We don't receive an unaligned ptep in huge_ptep_get, and riscv derives the
>> number of cont ptes from the pte itself, so why is the below required?
>
> Let me look at the actual report once more ...
>
> I thought for a second that the problem would be having the ptep not point at the
> start of the hugetlb page mapping. But that should always be the case.
> So yes, riscv does not have any problems.
>
> And IIUC, arm64 only has a problem when CONT_PTES != CONT_PMDS (16 kernel?).
>
> Yeah, aligning the ptep down doesn't solve anything, it's already properly aligned.
>
> To fix it inside arm64 code, we'd have to teach find_num_contig() to
> ignore the ptep and instead look for the cont bit, maybe?
>
> But I'm sure I messed this up as I am working on 10 things at the same time :D
>
>
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index d477a9dd1b472..d1d03795c135e 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -76,7 +76,7 @@ bool arch_hugetlb_migration_supported(struct hstate *h)
> #endif
>
> static int find_num_contig(struct mm_struct *mm, unsigned long addr,
> - pte_t *ptep, size_t *pgsize)
> + size_t *pgsize)
> {
> pgd_t *pgdp = pgd_offset(mm, addr);
> p4d_t *p4dp;
> @@ -87,7 +87,7 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
> p4dp = p4d_offset(pgdp, addr);
> pudp = pud_offset(p4dp, addr);
> pmdp = pmd_offset(pudp, addr);
> - if ((pte_t *)pmdp == ptep) {
> + if (pmd_cont(*pmdp)) {

We can simply do this right:

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index b8432886085af..a35fa373263dc 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -87,7 +87,7 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
p4dp = p4d_offset(pgdp, addr);
pudp = pud_offset(p4dp, addr);
pmdp = pmd_offset(pudp, addr);
- if ((pte_t *)pmdp == ptep) {
+ if ((pte_t *)PTR_ALIGN_DOWN(pmdp, sizeof(*pmdp) * CONT_PMDS) == ptep) {
*pgsize = PMD_SIZE;
return CONT_PMDS;
}


> *pgsize = PMD_SIZE;
> return CONT_PMDS;
> }
> @@ -131,7 +131,7 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> if (!pte_present(orig_pte) || !pte_cont(orig_pte))
> return orig_pte;
>
> - ncontig = find_num_contig(mm, addr, ptep, &pgsize);
> + ncontig = find_num_contig(mm, addr, &pgsize);
> for (i = 0; i < ncontig; i++, ptep++) {
> pte_t pte = __ptep_get(ptep);
>
> @@ -475,7 +475,7 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm,
> return;
> }
>
> - ncontig = find_num_contig(mm, addr, ptep, &pgsize);
> + ncontig = find_num_contig(mm, addr, &pgsize);
>
> pte = get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig);
> pte = pte_wrprotect(pte);
> diff --git a/mm/memory.c b/mm/memory.c
>
>