Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Gregory Price
Date: Thu Feb 26 2026 - 17:49:24 EST
On Thu, Feb 26, 2026 at 12:54:08AM -0500, Gregory Price wrote:
> On Thu, Feb 26, 2026 at 02:27:24PM +1100, Alistair Popple wrote:
>
> > > If NUMA is the interface we want, then NODE_DATA is the right direction
> > > regardless of struct page's future or what zone it lives in.
> > >
> > > There's no reason to keep per-page pgmap w/ device-to-node mappings.
> >
> > In reality I suspect that's already the case today. I'm not sure we need
> > per-page pgmap.
> >
>
> Probably, and maybe there's a good argument for stealing 80-90% of the
> common surface here, shunting ZONE_DEVICE to use this instead of pgmap
> before we go all the way to private nodes.
>
Out of curiosity i went digging through existing users, and it seems
like the average driver has 1-8 discrete pgmaps, with Nouveau being an
outliar that does ad-hoc registering in 256MB chunks, with the relevant
annoyance being the percpu_ref it uses to track lifetime of the pgmap,
and the fact that they can be non-contiguous.
tl;dr here: a 1-to-1 mapping of node-to-pgmap isn't realistic for most
existing ZONE_DEVICE users, meaning a 1-op lookup (page->pgmap) turns
into a multi-op pointer chase on and range comparison.
Not sure that turns out well for anyone (only on ZONE_DEVICE / managed
node users, all traditional nodes still have a simple pgdat or page->flag
lookup to check membership).
There's an argument for trying to do this just for the sake of getting
pgmap out of struct page/folio, but this only deals with the problem on
NUMA systems.
For non-numa systems the pgmap still probably ends up in folio_ext
(assuming we get there), but even that might not be sufficient get LRU
back. Might need Willy's opinion here.
~Gregory