Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Balbir Singh
Date: Wed Jun 03 2026 - 01:00:29 EST
On Tue, Jun 02, 2026 at 09:57:48AM +0100, Gregory Price wrote:
> On Tue, Jun 02, 2026 at 12:16:50PM +1000, Balbir Singh wrote:
> > On Sun, May 24, 2026 at 09:50:06PM -0400, Gregory Price wrote:
> > >
> > > I'm debating on whether to include OPS_MEMPOLICY in the initial version
> > > if only because it's not intuitive how it interacts with pagecache. That
> > > needs more time to bake.
> > >
> >
> > It makes sense to look at it and then decide if it makes sense.
> >
>
> I am thinking i will ship without any OPS flags at all for now and the
> have the introduction of ops as a separate series.
>
> > > alloc_pages_node() is the kernel interface
> >
> > I was think we wouldn't need explicit flags and that allocations would
> > happen from user space using __GFP_THISNODE to the node or via a nodemask
> > based on nodes of interest. Is there a reason to add this flag, a system
> > might have more than one source of N_MEMORY_PRIVATE?
> >
>
> There's a few things to unpack here. I discussed this many times on
> list and at LSF, but to reiterate.
>
> 1) __GFP_THISNODE is insufficient to enforce isolation and otherwise
> not particularly useful. Additionally, from userland, it's not
> something you can actually set.
I was thinking mbind()/mempolicy() is how we get to it. It already
accepts a nodemask.
>
> for node in possible_nodes:
> alloc_pages_node(private_node, __GFP_THISNODE)
>
> In fact it's the opposite semantic of what we want.
> THISNODE says: "Do not fallback back to OTHER nodes".
>
That's why we need to control the fallback nodes carefully for
N_MEMORY_PRIVATE
> The semantic we want is "Do not allow allocations from private
> nodes UNLESS we specifically request" (__GFP_PRIVATE).
>
> __GFP_THISNODE does not actually buy you anything here, AND it's
> worse, in the scenario where a private node makes its way into the
> preferred slot (via possible_nodes or some other nodemask), the
> allocator cannot fall back to a node it can access.
>
> __GFP_THISNODE cannot be overloaded to do anything useful here.
Let me clarify, I meant to say, let's use a nodemask for allocation
and __GFP_THISNODE gets us to the node we desire, if that is the only
node. My earlier comment might not have been clear.
>
> 2) We're trying not to expose *ANY* userland APIs for this, at all.
>
> The ultimate goal here should be one of two things:
>
> 1) fd = open(/dev/xxx, ...);
> mem = mmap(fd, ...);
> mem[0] = 0xDEADBEEF; /* Fault device page into page table */
>
> In this case, the driver is responsible for doing the
> alloc_pages_node() call.
>
> or
>
> 2) mem = mmap(NULL, ..., ANON);
> mbind(mem, ..., private_node);
> mem[0] = 0xDEADBEEF; /* Fault device page into page table */
>
> in this case mempolicy.c is responsible for doing the
> alloc_pages_node() call via the _mpol() alloc variants.
>
> Addition OPT flags (reclaim, compaction, whatever), would
> (optionally) allow mm/ to operate on the device memory with, for
> example, mmu_notifier callbacks to tell the device to invalidate
> whatever it's caching about that page.
>
> This would all be relatively transparent the userland, all userland
> "knows" is that it's getting memory from a device (/dev/xxx) or a
> node it's otherwise aware of hosting device memory somehow.
>
Why not use mbind() API's? Do we want to gate allocation/privileges
via a /dev?
Balbir