Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86

Next message: Joel Fernandes: "Re: [PATCH v10 7/8] nova-core: mm: Select GPU_BUDDY for VRAM allocation"
Previous message: Joel Fernandes: "Re: [PATCH v10 6/8] rust: gpu: Add GPU buddy allocator bindings"
In reply to: Kiryl Shutsemau: "Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86"
Next in thread: Kiryl Shutsemau: "Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Pedro Falcato

Date: Thu Feb 19 2026 - 10:33:59 EST

On Thu, Feb 19, 2026 at 03:08:51PM +0000, Kiryl Shutsemau wrote:
> No, there's no new hardware (that I know of). I want to explore what page size
> means.
>
> The kernel uses the same value - PAGE_SIZE - for two things:
>
> - the order-0 buddy allocation size;
>
> - the granularity of virtual address space mapping;
>
> I think we can benefit from separating these two meanings and allowing
> order-0 allocations to be larger than the virtual address space covered by a
> PTE entry.
>

Doesn't this idea make less sense these days, with mTHP? Simply by toggling one
of the entries in /sys/kernel/mm/transparent_hugepage.

> The main motivation is scalability. Managing memory on multi-terabyte
> machines in 4k is suboptimal, to say the least.
>
> Potential benefits of the approach (assuming 64k pages):
>
> - The order-0 page size cuts struct page overhead by a factor of 16. From
> ~1.6% of RAM to ~0.1%;
>
> - TLB wins on machines with TLB coalescing as long as mapping is naturally
> aligned;
>
> - Order-5 allocation is 2M, resulting in less pressure on the zone lock;
>
> - 1G pages are within possibility for the buddy allocator - order-14
> allocation. It can open the road to 1G THPs.
>
> - As with THP, fewer pages - less pressure on the LRU lock;

We could perhaps add a way to enforce a min_order globally on the page cache,
as a way to address it.

There are some points there which aren't addressed by mTHP work in any way
(1G THPs for one), others which are being addressed separately (memdesc work
trying to cut down on struct page overhead).

(I also don't understand your point about order-5 allocation, AFAIK pcp will
cache up to COSTLY_ORDER (3) and PMD order, but I'm probably not seeing the
full picture)

--
Pedro