Re: [LSF/MM/BPF TOPIC] 64k (or 16k) base page size on x86

From: Kiryl Shutsemau

Date: Fri Feb 20 2026 - 10:50:15 EST

On Fri, Feb 20, 2026 at 10:17:45AM -0500, Liam R. Howlett wrote:
> * Kiryl Shutsemau <kas@xxxxxxxxxx> [260220 07:33]:
> > On Thu, Feb 19, 2026 at 10:28:20PM -0500, Liam R. Howlett wrote:
> > > * Kiryl Shutsemau <kas@xxxxxxxxxx> [260219 17:05]:
> > > > On Thu, Feb 19, 2026 at 09:08:57AM -0800, Dave Hansen wrote:
> > > > > On 2/19/26 07:08, Kiryl Shutsemau wrote:
> > > > > > - The order-0 page size cuts struct page overhead by a factor of 16. From
> > > > > > ~1.6% of RAM to ~0.1%;
> > > > > ...
> > > > > But, it will mostly be getting better performance at the _cost_ of
> > > > > consuming more RAM, not saving RAM.
> > > >
> > > > That's fair.
> > > >
> > > > The problem with struct page memory consumption is that it is static and
> > > > cannot be reclaimed. You pay the struct page tax no matter what.
> > > >
> > > > Page cache rounding overhead can be large, but a motivated userspace can
> > > > keep it under control by avoiding splitting a dataset into many small
> > > > files. And this memory is reclaimable.
> > > >
> > >
> > > But we are in reclaim a lot more these days. As I'm sure you are aware,
> > > we are trying to maximize the resources (both cpu and ram) of any
> > > machine powered on. Entering reclaim will consume the cpu time and will
> > > affect other tasks.
> > >
> > > Especially with multiple workload machines, the tendency is to have a
> > > primary focus with the lower desired work being killed, if necessary.
> > > Reducing the overhead just means more secondary tasks, or a bigger
> > > footprint of the ones already executing.
> > >
> > > Increasing the memory pressure will degrade the primary workload more
> > > frequently, even if we recover enough to avoid OOMing the secondary.
> > >
> > > While in the struct page tax world, the secondary task would be killed
> > > after a shorter (and less frequently executed) reclaim comes up short.
> > > So, I would think that we would be degrading the primary workload in an
> > > attempt to keep the secondary alive? Maybe I'm over-simplifying here?
> >
> > I am not sure I fully follow your point.
> >
> > Sizing tasks and scheduling tasks between machines is hard in general.
> > I don't think the balance between struct page tax and page cache
> > rounding overhead is going to be the primary factor.
>
> I think there are more trade offs than what you listed. It's still
> probably worth doing, but I wanted to know if you though that this would
> cause us to spend more time in reclaim, which seems to be implied above.
> So, another trade-off might be all the reclaim penalty being paid more
> frequently?

I am not sure.

Kernel would need to do less work in reclaim per unit of memory.
Depending on workloads you might see less allocation events and
therefore less frequent reclaim.

It's all too hand-wavy at the stage.

--
Kiryl Shutsemau / Kirill A. Shutemov