Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86

From: Matthew Wilcox

Date: Wed Apr 29 2026 - 10:44:56 EST


On Thu, Feb 19, 2026 at 03:08:51PM +0000, Kiryl Shutsemau wrote:
> No, there's no new hardware (that I know of). I want to explore what page size
> means.
>
> The kernel uses the same value - PAGE_SIZE - for two things:
>
> - the order-0 buddy allocation size;
>
> - the granularity of virtual address space mapping;
>
> I think we can benefit from separating these two meanings and allowing
> order-0 allocations to be larger than the virtual address space covered by a
> PTE entry.

I actually want to go in the other direction. I once came up with a
name -- POTAM -- which stands for Power Of Two Allocator with Metadata.
The use case was something like XFS's buffer cache where we want a
filesystem block size of data (so 0.5KiB to 64KiB) with some metadata
attached (xfs_buf is 664 bytes with debugging enabled!)

I set this aside to work on folios, but folios offer a back door to
unifying this with the buddy allocator. It's a long road, but here's
a sketch:

First, we separate memdescs from pages. I believe this lets us shrink
struct page down to 8 bytes (previously presented as various LSFMMs).

Second, we get rid of 'page' in things like sglist and bvec. This is
already in progress for various other good reasons.

Third (this bit is new), we replace memmap with something like a maple
tree. That lets us lookup memdescs by physical address (typically
a memdesc will contain either the physical or virtual address of the
memory it controls).

Fourth, we change the unit of the lookup in the maple tree from being
a PFN to being address / 512 (or whatever size we want to use as our
minimum).

Now we can have memdescs for an arbitrary power of two which means we
can ditch all the awful code from ppc/s390 page table handling where
they try to share one memdesc between several different page tables.

It's going to be "fun" avoiding allocation deadlocks where we want to
rebalance the maple tree containing the memdescs ... that's a five year
away problem.