Re: [LSF/MM/BPF TOPIC] Per-process page size

From: David Hildenbrand (Arm)

Date: Mon Feb 23 2026 - 08:03:47 EST

On 2/23/26 13:49, Pedro Falcato wrote:

On Mon, Feb 23, 2026 at 10:37:55AM +0530, Dev Jain wrote:

I don't understand. What exactly are you trying to do here? Maintain 2
different paging structures, one for core mm and the other for the arch? As
done in architectures with no radix tree paging structures?

The mm->pgd will be the software pagetable. So suppose that do_anonymous_page is
doing set_ptes on the PTE table belonging to the software pagetable. We will
hook a "native_set_ptes" into set_ptes, which will set the ptes on a different
pagetable maintained by arm64 code (probably mm_context_t->native_pgd).

Traditionally, you do this kind of funky manipulation in update_mmu_cache.

But this is still an extremely complex and invasive change (that I assume most
people would not like to see) with dubious benefit.

If so, that's wildly inefficient, unless you're willing to go into reclaimable
page tables on the arm64 side. And that brings extra problems and extra fun :)

I didn't understand the reclaimable reference, but yes we need to make this efficient.

I'm not talking about CPU runtime efficiency, but memory efficiency. Doing
this makes you essentially duplicate page tables - not exactly ideal. This is
a Known Problem in classic UNIX systems which do something similar
(but not the same): anonymous memory pointers are stored in some intermediary
structure (SunOS and UVM call it "amap"), and paging structures are entirely
redundant there. They can freely tear down a page table because they can freely
put it together from the amap and file mappings (what they call vm_object and
we call address_space).

Anyway, I'm boring you with these funny historical details so you can understand
the similarities: the Linux page table format generally matches hardware, and
we store anonymous memory "state" there, so you can't ever tear-down a pgtable
without losing state of whatever was mapped there before. However, if you go
down the "arm64 now has a separate pgtable structure", the roles switch:
arm64's internal page table format makes for the real page tables, and linux's
pgtable structure is nothing more than an "amap". So you could (and perhaps
should) freely reclaim arm64 MMU page tables once memory pressure hits, because
they are freely discardable.

Does this make sense?

I've been thinking about building the 64k page tables similar to how HMM/KVM handles it, invalidating them through mmu notifiers etc and building them on demand.

Considering the 64k MMU of a process just like a special device that builds its own page tables.

This way, they could get reclaimed more easily and most of the core + arm64 page able manipulation code could be kept as is.

However, I don't know how much the performance impact of that approach would be.

--
Cheers,

David