Re: [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb
From: Dev Jain
Date: Fri Jun 26 2026 - 09:23:38 EST
On 26/06/26 1:18 pm, Lance Yang wrote:
>
> On Thu, Jun 25, 2026 at 11:29:53AM +0000, Dev Jain wrote:
>> check_pte() is the final validation step in page_vma_mapped_walk().
>> It reads pvmw->pte with ptep_get() to decide whether the entry maps
>> the PFN range being walked. For hugetlb VMAs, that pointer refers
>> to a hugetlb entry.
>>
>> On arches which provide their own huge_ptep_get() to dereference a huge
>> pte pointer, accessing via ptep_get() would cause pte_pfn(),
>> pte_present() etc to misbehave.
>>
>> It is not clear whether this has a trivially visible effect to userspace.
>>
>> Use huge_ptep_get() to dereference a huge pte pointer.
>>
>> Fixes: ace71a19cec5 ("mm: introduce page_vma_mapped_walk()")
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Dev Jain <dev.jain@xxxxxxx>
>> ---
>> mm/page_vma_mapped.c | 8 +++++++-
>> 1 file changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
>> index 2ccbabfb2cc17..18e1d341f463c 100644
>> --- a/mm/page_vma_mapped.c
>> +++ b/mm/page_vma_mapped.c
>> @@ -107,7 +107,13 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw, pmd_t *pmdvalp,
>> static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
>> {
>> unsigned long pfn;
>> - pte_t ptent = ptep_get(pvmw->pte);
>> + pte_t ptent;
>> +
>> + if (is_vm_hugetlb_page(pvmw->vma))
>> + ptent = huge_ptep_get(pvmw->vma->vm_mm, pvmw->address,
>> + pvmw->pte);
>
> I think check_pte() can pass a wrong address to huge_ptep_get() ...
Won't this be handled by rmap_walk_anon/rmap_walk_file - they are the ones
performing the rmap traversal and passing address to try_to_unmap_one/folio_referenced_one
etc ...
>
> Not sure that is wrong in the first place. For memory failure,
> page_mapped_in_vma() can be called with a poisoned tail page of a hugetlb
> folio. In that case, pvmw->address need not be hugepage-aligned.
>
> @Miaohe
>
> For arm64, CONT_PMD_SIZE is one supported HugeTLB size. With such a VMA,
> page_vma_mapped_walk() passes that size to hugetlb_walk():
>
> bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> {
> ...
> if (unlikely(is_vm_hugetlb_page(vma))) {
> ...
> pvmw->pte = hugetlb_walk(vma, pvmw->address, size);
> ...
> }
> ...
> }
>
> hugetlb_walk() then calls arm64 huge_pte_offset(mm, addr, sz). For
> sz == CONT_PMD_SIZE, huge_pte_offset() aligns its local addr before
> calculating pmdp:
>
> pte_t *huge_pte_offset(struct mm_struct *mm,
> unsigned long addr, unsigned long sz)
> {
> ...
> if (sz == CONT_PMD_SIZE)
> addr &= CONT_PMD_MASK;
>
> pmdp = pmd_offset(pudp, addr);
> pmd = READ_ONCE(*pmdp);
> ...
> }
>
> So for that case, pvmw->pte is calculated from the aligned addr, not
> necessarily from the original pvmw->address. But check_pte() passes the
> original address together with pvmw->pte:
>
> + ptent = huge_ptep_get(pvmw->vma->vm_mm, pvmw->address,
> + pvmw->pte);
>
> arm64 then uses that addr again to choose ncontig:
>
> pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
> {
> ...
> ncontig = find_num_contig(mm, addr, ptep, &pgsize);
> for (i = 0; i < ncontig; i++, ptep++) {
> ...
> }
> return orig_pte;
> }
>
> static int find_num_contig(struct mm_struct *mm, unsigned long addr,
> pte_t *ptep, size_t *pgsize)
> {
> pgd_t *pgdp = pgd_offset(mm, addr);
> p4d_t *p4dp;
> pud_t *pudp;
> pmd_t *pmdp;
>
> *pgsize = PAGE_SIZE;
> p4dp = p4d_offset(pgdp, addr);
> pudp = pud_offset(p4dp, addr);
> pmdp = pmd_offset(pudp, addr);
> if ((pte_t *)pmdp == ptep) {
> *pgsize = PMD_SIZE;
> return CONT_PMDS;
> }
> return CONT_PTES;
> }
>
> With a tail address, pmdp may no longer point at pvmw->pte, so
> find_num_contig() can return CONT_PTES for a CONT_PMD HugeTLB mapping.
>
> On 16K arm64, that changes ncontig from 32 to 128. So huge_ptep_get()
> can walk past the CONT_PMD entries, and possibly past the PMD table.
>
> Should check_pte() pass the address matching pvmw->pte, sth like:
>
> ---8<---
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index 406fd50bbd8f..58463493bd3d 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -109,11 +109,14 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw, unsigned long pte_nr)
> unsigned long pfn;
> pte_t ptent;
>
> - if (is_vm_hugetlb_page(pvmw->vma))
> - ptent = huge_ptep_get(pvmw->vma->vm_mm, pvmw->address,
> - pvmw->pte);
> - else
> + if (is_vm_hugetlb_page(pvmw->vma)) {
> + struct hstate *hstate = hstate_vma(pvmw->vma);
> + unsigned long haddr = pvmw->address & huge_page_mask(hstate);
> +
> + ptent = huge_ptep_get(pvmw->vma->vm_mm, haddr, pvmw->pte);
> + } else {
> ptent = ptep_get(pvmw->pte);
> + }
>
> if (pvmw->flags & PVMW_MIGRATION) {
> const softleaf_t entry = softleaf_from_pte(ptent);
> --
>
> while leaving pvmw->address unchanged for page_mapped_in_vma()?
>
> Cheers, Lance
>
>> + else
>> + ptent = ptep_get(pvmw->pte);
>>
>> if (pvmw->flags & PVMW_MIGRATION) {
>> const softleaf_t entry = softleaf_from_pte(ptent);
>> --
>> 2.43.0
>>
>>