Re: [PATCH 4/5] mm/page_vma_mapped: use huge_ptep_get() for hugetlb
From: Lance Yang
Date: Mon Jun 29 2026 - 04:23:30 EST
On 2026/6/29 16:05, David Hildenbrand (Arm) wrote:
On 6/29/26 09:48, Lance Yang wrote:
powerpc handles it correctly in the weird "span two PMD entries" case by
On Mon, Jun 29, 2026 at 09:25:48AM +0200, David Hildenbrand (Arm) wrote:
On 6/29/26 08:48, Dev Jain wrote:>from pagewalk code (where some users like pagemap need the actual address).
Sashiko notes other places:
https://sashiko.dev/#/patchset/20260625112955.3254283-1-dev.jain%40arm.com
Yeah, that looks shaky. We do seem to have a bunch of these cases, primarily
Indeed ...
I think we have two options
1) To prevent any (further) issues, make huge_ptep_get() always consume the
hstate, and let the arch code deal with aligning it. Invasive.
Kinda lean toward option 1, even if it's more invasive. If we pass the
hstate down, each arch can figure out the right addr from there.
2) Make the arch code handle aligning without the hstate.
diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 30772a909aea3..303a1b74796c9 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -126,6 +126,9 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
return orig_pte;
ncontig = find_num_contig(mm, addr, ptep, &pgsize);
+ ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) * ncontig);
+ orig_pte = __ptep_get(ptep);
+
for (i = 0; i < ncontig; i++, ptep++) {
pte_t pte = __ptep_get(ptep);
(nshift/order instead of ncontig might avoid a multiplication, but not sure if that matters in practice)
IIUC, that's similar to what huge_ptep_get() does on ppc.
static inline pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
if (ptep_is_8m_pmdp(mm, addr, ptep))
ptep = pte_offset_kernel((pmd_t *)ptep, ALIGN_DOWN(addr, SZ_8M));
return ptep_get(ptep);
}
I'd assume we could do the same on riscv. Besides that, I don't think any arch has cont
entries.
AFAICT, for huge_ptep_get() the addr users are arm64 and powerpc, riscv
doesn't really care about addr there. Looks mostly arm64-specific ...
aligning the PMD down.
Risc-v copied from arm64, but can simply derive the #entries from the PTE value.
it doesn't have to re-walk the table using the address.
Yeah, fair enough, thanks for spelling that out!
But I think the following is required to fix, no?
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index a6d217112cf46..7e25cc13b3dba 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -5,6 +5,7 @@
#ifdef CONFIG_RISCV_ISA_SVNAPOT
pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
{
- unsigned long pte_num;
+ unsigned long pte_num, pte_order;
int i;
pte_t orig_pte = ptep_get(ptep);
@@ -12,7 +13,11 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr,
pte_t *ptep)
if (!pte_present(orig_pte) || !pte_napot(orig_pte))
return orig_pte;
- pte_num = napot_pte_num(napot_cont_order(orig_pte));
+ pte_order = napot_cont_order(orig_pte);
+ pte_num = napot_pte_num(pte_order);
+
+ ptep = PTR_ALIGN_DOWN(ptep, sizeof(*ptep) << pte_order);
+ orig_pte = ptep_get(ptep);
for (i = 0; i < pte_num; i++, ptep++) {
pte_t pte = ptep_get(ptep);
I'd prefer (2) as a simple stable fix first.
Right. I'm good with (2) as the stable fix first :)
Still pretty new to arch code, but happy to stare at it some more.
If we do (1) on top, huge_ptep_get() on arm64 could stop walking the page table
another time.
If we pass the hstate (or vma) to set_huge_pte_at(), huge_pte_clear(),
huge_ptep_get_and_clear(), we could likely get rid of the re-walk in
num_contig_ptes() entirely and possibly just remove it.
That would probably be cleanest.
Agreed!