[RFC PATCH v3 0/8] Implement a new generic pagewalk API

From: Oscar Salvador

Date: Mon May 25 2026 - 12:55:56 EST


Changelog:
rfcv2 -> rfcv3:
- Fix an out-of-bounds write
- Convert clear_refs to the new API
- Fix issue when reading cont-PMDs
rfc -> rfcv2:
- Add pte_hole functionality
- Fix pagemap issues
- Fix shmem in smap
- Testing with pagemap "testsuite"

[WARNING]

This is not yet fully complete, but before investing more time into it I would like
to know whether 1) this is heading into the right direction and 2) this is something
we are still interested in.
There are still things that need work:

- convert make_uffd_wp_huge_pte: Since hugetlb is being dealt like a
pte, we inherited PTE_MARKERs for it when those came into play, and
AFAIK, those are being used mostly for UFFD.
From here on we have two options: 1) find another way to deal with
UFFD without markers or 2) introduce markers for PMD and PUD level.
I am leaning towards option 1), because 2) seems a bit unfair.
I still need to put some thought into it and see how we can achieve
that.

- Teach the new API how to use other kind of locks. E.g: pagemap scan
needs to take i_mmap_lock during the scanning, so we need to able to
take that lock. I have some ideas to do that, but something for the
new version.

- Find corner-cases and fix them.


Kudos go to David, who was the person suggesting the interface and
he gave me some ideas where to begin, besides providing feedback
on early stages (in case there is something stupid don't blame him, blame me)

Also, I would like to thank Vlastimil, who helped me running this
patchset quite a few times through Claude, to catch some fixes.

[/WARNING]

[TESTING]
Part of the testing has been to duplicate
/proc/$$/(pagemap,smaps,numa_maps,clear_refs) and have the same with
_lab extension linked to the old API.
In that way I could check whether the outcome from e.g: /proc/$$/smaps
and /proc/$$/smaps_lab was the same for any given program.
The same I did for pagemap and numa_maps.

Also, regarding pagemap:
So far, tools/mm/page-types.c reports the right outcome (compared to the old API),
and tools/testing/selftests/mm/pagemap_ioctl.c only reports 4 failing tests.
Although to be honest, I do not how much should I trust that one because if I
add a few delays in the userspace code, some tests that were failing before are not
now, so yeah.

localhost:~/workspace # ./page-types -p 1168
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000800 1 0 ___________M_______________________________ mmap
0x0000000000000828 2 0 ___U_l_____M_______________________________ uptodate,lru,mmap
0x000000000000082c 1 0 __RU_l_____M_______________________________ referenced,uptodate,lru,mmap
0x0000000000004838 1 0 ___UDl_____M__b____________________________ uptodate,dirty,lru,mmap,swapbacked
0x000000000000086c 423 1 __RU_lA____M_______________________________ referenced,uptodate,lru,active,mmap
0x0000000000205828 29 0 ___U_l_____Ma_b______x_____________________ uptodate,lru,mmap,anonymous,swapbacked,ksm
0x000000000020586c 1 0 __RU_lA____Ma_b______x_____________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
total 458 1

localhost:~/workspace # ./page-types_lab -p 1168
flags page-count MB symbolic-flags long-symbolic-flags
0x0000000000000804 1 0 __R________M_______________________________ referenced,mmap
0x0000000000000828 2 0 ___U_l_____M_______________________________ uptodate,lru,mmap
0x000000000000082c 1 0 __RU_l_____M_______________________________ referenced,uptodate,lru,mmap
0x0000000000004838 1 0 ___UDl_____M__b____________________________ uptodate,dirty,lru,mmap,swapbacked
0x000000000000086c 423 1 __RU_lA____M_______________________________ referenced,uptodate,lru,active,mmap
0x0000000000205828 29 0 ___U_l_____Ma_b______x_____________________ uptodate,lru,mmap,anonymous,swapbacked,ksm
0x000000000020586c 1 0 __RU_lA____Ma_b______x_____________________ referenced,uptodate,lru,active,mmap,anonymous,swapbacked,ksm
total 458 1

page-types being using the new API and page-types_lab the old one.

# ./pagemap_ioctl
TAP version 13
1..117
ok 1 sanity_tests_sd Zero range size is valid
ok 2 sanity_tests_sd output bu
ok 35 Walk_end: 1 max page
ok 36 Page testing: all new pages must not be written (dirty)
ok 37 Page testing: all pages must be written (dirty)
ok 38 Page testing: all pages dirty other than first and the last one
ok 39 Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 40 Page testing: only middle page dirty
ok 41 Page testing: only two middle pages dirty
ok 42 Large Page testing: all new pages must not be written (dirty)
ok 43 Large Page testing: all pages must be written (dirty)
ok 44 Large Page testing: all pages dirty other than first and the last one
ok 45 Large Page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 46 Large Page testing: only middle page dirty
ok 47 Large Page testing: only two middle pages dirty
ok 48 Huge page testing: all new pages must not be written (dirty)
ok 49 Huge page testing: all pages must be written (dirty)
ok 50 Huge page testing: all pages dirty other than first and the last one
ok 51 Huge page testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 52 Huge page testing: only middle page dirty
ok 53 Huge page testing: only two middle pages dirty
ok 54 Hugetlb shmem testing: all new pages must not be written (dirty)
ok 55 Hugetlb shmem testing: all pages must be written (dirty)
ok 56 Hugetlb shmem testing: all pages dirty other than first and the last one
ok 57 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 58 Hugetlb shmem testing: only middle page dirty
not ok 59 Hugetlb shmem testing: only two middle pages dirty
ok 60 Hugetlb mem testing: all new pages must not be written (dirty)
ok 61 Hugetlb mem testing: all pages must be written (dirty)
ok 62 Hugetlb mem testing: all pages dirty other than first and the last one
ok 63 Hugetlb mem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 64 Hugetlb mem testing: only middle page dirty
not ok 65 Hugetlb mem testing: only two middle pages dirty
ok 66 Hugetlb shmem testing: all new pages must not be written (dirty)
ok 67 Hugetlb shmem testing: all pages must be written (dirty)
ok 68 Hugetlb shmem testing: all pages dirty other than first and the last one
ok 69 Hugetlb shmem testing: PM_SCAN_WP_MATCHING | PM_SCAN_CHECK_WPASYNC
ok 70 Hugetlb shmem testing: only middle page dirty
not ok 71 Hugetlb shmem testing: only two middle pages dirty
ok 72 File memory testing: all new pages must not be written (dirty)
ok 73 File memory testing: all p
# Totals: pass:113 fail:4 xfail:0 xpass:0 skip:0 error:0

[/TESTING]

In the LSFMM/BFP 2025, there was a general agreement that we 1) would like to have
a generic pagewalk API 2) that replaces the existing one with callbacks if possible
and 3) that HugeTLB can use without the need to special case it (e.g: not having to
depend on .hugetlb_entry callbacks)., which means having a lot of duplicated
code and also having a lot of special casing just because hugetlb lore.

pt_range_walk API tries to do that and replaces the old behaviour of "in
HugeTLB world everything reads as a PTE" and starts reading HugeTLB entries
the way they really are, that means interpreting them as PMD/PUD entries and
contiguous-PMD/PTE entries.

In order to achieve that, we need some infrastructure we did not really need until
know, in order to be able to read HugeTLB pages as PUD/PMD entries.
E.g: softleaf_from_pud had to be added and some other pud_* functions.

In a few words, this API goes through an address range and returns
whatever it is in there (swap/hwpoison/migration/marker entries, folios,
pfn and device entries, or nothing).

These are the internal return types the API uses:

PT_TYPE_NONE
PT_TYPE_FOLIO
PT_TYPE_MARKER
PT_TYPE_PFN
PT_TYPE_SWAP
PT_TYPE_MIGRATION
PT_TYPE_DEVICE
PT_TYPE_HWPOISON

The API also handles locking and batching itself, so the caller
does not really need to bother with that.

In order to handle contiguous-PMD mapped hugetlb pages, folio_pmd_batch,
which is an analogous of folio_pte_batch, has been implemented.

More information about the API can be found in patch #4.

This was tested on x86_64 and arm64, but as I said, it is still
incomplete, therefore the RFC, to gather some initial feedback before
investing more time into this.

For now, all users of the old API from fs/proc/task_mmu.c have been
converted: /proc/pid/(smaps|numa_maps|pagemap|clear_refs).

Thanks in advance

Oscar Salvador (8):
mm: Add softleaf_from_pud
mm: Add {pmd,pud}_huge_lock helper
mm: Implement folio_pmd_batch
mm: Implement pt_range_walk
mm: Make /proc/pid/smaps use the new generic pagewalk API
mm: Make /proc/pid/numa_maps use the new generic pagewalk API
mm: Make /proc/pid/pagemap use the new generic pagewalk API
mm: Make /proc/pid/clear_refs use the new generic pagewalk API

arch/arm64/include/asm/pgtable.h | 41 +
arch/loongarch/include/asm/pgtable.h | 1 +
arch/powerpc/include/asm/book3s/64/pgtable.h | 7 +
arch/s390/include/asm/pgtable.h | 38 +
arch/x86/include/asm/pgtable.h | 53 +
arch/x86/include/asm/pgtable_64.h | 2 +
arch/x86/mm/pgtable.c | 18 +-
fs/proc/task_mmu.c | 2295 ++++++++----------
include/asm-generic/pgtable_uffd.h | 15 +
include/linux/leafops.h | 46 +
include/linux/mm.h | 2 +
include/linux/mm_inline.h | 32 +
include/linux/pagewalk.h | 106 +
include/linux/pgtable.h | 95 +
mm/internal.h | 75 +-
mm/memory.c | 22 +
mm/pagewalk.c | 400 +++
mm/pgtable-generic.c | 21 +
18 files changed, 2039 insertions(+), 1230 deletions(-)

--
2.53.0