Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86

Next message: Benjamin Block: "Re: [PATCH v1 1/1] PCI/IOV: Add nested locking in sriov_add_vfs/sriov_del_vfs for complete serialization"
Previous message: Frank Li: "Re: [PATCH RESEND v10 2/2] dmaengine: dw-edma: Add non-LL mode"
In reply to: Kalesh Singh: "Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86"
Next in thread: Kiryl Shutsemau: "Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Dave Hansen

Date: Thu Feb 19 2026 - 12:09:06 EST

On 2/19/26 07:08, Kiryl Shutsemau wrote:
> - The order-0 page size cuts struct page overhead by a factor of 16. From
> ~1.6% of RAM to ~0.1%;

First of all, this looks like fun. Nice work! I'm not opposed at all in
concept to cleaning up things and doing the logical separation you
described to split buddy granularity and mapping granularity. That seems
like a worthy endeavor and some of the union/#define tricks look like a
likely viable way to do it incrementally.

But I don't think there's going to be a lot of memory savings in the
end. Maybe this would bring the mem= hyperscalers back into the fold and
have them actually start using 'struct page' again for their VM memory.
Dunno.

But, let's look at my kernel directory and round the file sizes up to
4k, 16k and 64k:

find . -printf '%s\n' | while read size; do echo \
$(((size + 0x0fff) & 0xfffff000)) \
$(((size + 0x3fff) & 0xffffc000)) \
$(((size + 0xffff) & 0xffff0000));
done

... and add them all up:

11,297,648 KB - on disk
11,297,712 KB - in a 4k page cache
12,223,488 KB - in a 16k page cache
16,623,296 KB - in a 64k page cache

So a 64k page cache eats ~5GB of extra memory for a kernel tree (well,
_my_ kernel tree). In other words, if you are looking for memory savings
on my laptop, you'll need ~300GB of RAM before 'struct page' overhead
overwhelms the page cache bloat from a single kernel tree.

The whole kernel obviously isn't in the page cache all at the same time.
The page cache across the system is also obviously different than a
kernel tree, but you get the point.

That's not to diminish how useful something like this might be,
especially for folks that are sensitive to 'struct page' overhead or
allocator performance.

But, it will mostly be getting better performance at the _cost_ of
consuming more RAM, not saving RAM.