Re: [LSF/MM/BPF TOPIC][RFC PATCH v4 00/27] Private Memory Nodes (w/ Compressed RAM)
From: Gregory Price
Date: Wed Jun 10 2026 - 16:13:17 EST
On Wed, Jun 10, 2026 at 08:59:59PM +0200, David Hildenbrand (Arm) wrote:
> On 6/10/26 18:37, Gregory Price wrote:
> > On Wed, Jun 10, 2026 at 05:00:33PM +0200, David Hildenbrand (Arm) wrote:
> >> On 6/10/26 12:41, Gregory Price wrote:
> >
> > So, I remember this being asked, and I didn't fully grok the request.
> >
> > I'm still not sure I fully understand the question, so apologies if I'm
> > answer the wrong things here.
> >
> > I understand this question in two ways:
> >
> > 1) Can we disallow PAGE allocation and limit this to FOLIO allocation
>
> Yes. Can we only allow folios to be allocated from private memory nodes. So let
> me reply to that one below.
>
... snip ...
>
> At LSF/MM we talked about how GFP flags are bad and how deriving stuff from the
> context might be better. I think there was also talk about how the memalloc_*
> interface might be a better way forward. Maybe we would start giving the
> allocator more context ("we are allocating a folio").
>
> The following is incomplete (esp. hugetlb stuff I assume), just as some idea:
>
Ok, the mental gap I have is not knowing the full context behind
memalloc. I'll take this and do some reading / prototyping, but
this looks entirely reasonable.
I will still probably send the next RFC version tomorrow or friday,
as I want to get some eyes on the __GFP_PRIVATE-less pattern.
Also, I made a new `anondax` driver which enables userland testing
of this functionality without any specialty hardware.
tl;dr:
fd = open("/dev/anondax0.0", ....);
buf = mmap(fd, ...);
buf[0] = 0xDEADBEEF; /* fault to anondax driver */
static vm_fault_t anon_dax_fault(struct vm_fault *vmf)
{
struct dev_dax *dev_dax = vmf->vma->vm_file->private_data;
vm_fault_t ret;
int id;
id = dax_read_lock();
if (!dax_alive(dev_dax->dax_dev))
ret = VM_FAULT_SIGBUS;
else
ret = do_anonymous_page_node(vmf, dev_dax->target_node);
dax_read_unlock(id);
if (ret & VM_FAULT_OOM)
return VM_FAULT_SIGBUS;
return ret ? ret : VM_FAULT_NOPAGE;
}
With:
qemu-system-x86_64 -m 5G \
-object memory-backend-ram,id=m0,size=4G -numa node,nodeid=0,memdev=m0 \
-object memory-backend-ram,id=m1,size=1G -numa node,nodeid=1,memdev=m1 \
-append "... memmap=0x40000000!0x140000000"
Voila - buddy-managed private anonymous memory (1G region)
No need to reinvent page_alloc.c or fault handling :]
This can be used to hammer on reclaim/compaction/whatever support
without needing any particular hardware setup, and in fact it gives
some memory devices a path to support in userland while standards
get worked out.
do_anonymous_page_node is a bit of a bodge right now but I just haven't
fleshed it out yet. The idea is - don't reinvent the fault path, just
provide the appropriate context to memory.c to do the right thing.
If this is acceptable, I imagine whatever interface gets implemented
will carry an in-tree driver export only, similar to hotplug/kmem.
> From 64aaff5f40497201ecc089c3339df6576184c433 Mon Sep 17 00:00:00 2001
> From: "David Hildenbrand (Arm)" <david@xxxxxxxxxx>
> Date: Wed, 10 Jun 2026 20:55:49 +0200
> Subject: [PATCH] tmp
>
> Signed-off-by: David Hildenbrand (Arm) <david@xxxxxxxxxx>
> ---
> include/linux/sched.h | 2 +-
> include/linux/sched/mm.h | 11 +++++++++++
> mm/mempolicy.c | 14 ++++++++++++--
> mm/page_alloc.c | 7 ++++++-
> 4 files changed, 30 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index ee06cba5c6f5..9c850b7be6bf 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1778,7 +1778,7 @@ extern struct pid *cad_pid;
> * I am cleaning dirty pages from some other bdi. */
> #define PF_KTHREAD 0x00200000 /* I am a kernel thread */
> #define PF_RANDOMIZE 0x00400000 /* Randomize virtual address space */
> -#define PF__HOLE__00800000 0x00800000
> +#define PF__MEMALLOC_FOLIO 0x00800000 /* Allocating a folio that can end up on
> private memory nodes */
> #define PF__HOLE__01000000 0x01000000
> #define PF__HOLE__02000000 0x02000000
> #define PF_NO_SETAFFINITY 0x04000000 /* Userland is not allowed to meddle with
> cpus_mask */
> diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
> index 95d0040df584..2101a447c084 100644
> --- a/include/linux/sched/mm.h
> +++ b/include/linux/sched/mm.h
> @@ -471,6 +471,17 @@ static inline void memalloc_pin_restore(unsigned int flags)
> memalloc_flags_restore(flags);
> }
>
> +static inline unsigned int memalloc_folio_save(void)
> +{
> + return memalloc_flags_save(PF_MEMALLOC_FOLIO);
> +}
> +
> +static inline void memalloc_folio_restore(unsigned int flags)
> +{
> + memalloc_flags_restore(flags);
> +}
> +
> +
> #ifdef CONFIG_MEMCG
> DECLARE_PER_CPU(struct mem_cgroup *, int_active_memcg);
> /**
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 36699fabd3c2..a78b0e5a1fce 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2506,8 +2506,13 @@ static struct page *alloc_pages_mpol(gfp_t gfp, unsigned
> int order,
> struct folio *folio_alloc_mpol_noprof(gfp_t gfp, unsigned int order,
> struct mempolicy *pol, pgoff_t ilx, int nid)
> {
> - struct page *page = alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
> + struct page *page;
> + int flags;
> +
> + flags = memalloc_folio_save();
> + page = alloc_pages_mpol(gfp | __GFP_COMP, order, pol,
> ilx, nid);
> + memalloc_folio_restore(flags);
> if (!page)
> return NULL;
>
> @@ -2588,7 +2593,12 @@ EXPORT_SYMBOL(alloc_pages_noprof);
>
> struct folio *folio_alloc_noprof(gfp_t gfp, unsigned int order)
> {
> - return page_rmappable_folio(alloc_pages_noprof(gfp | __GFP_COMP, order));
> + struct folio *folio;
> + int flags;
> +
> + flags = memalloc_folio_save();
> + folio = page_rmappable_folio(alloc_pages_noprof(gfp | __GFP_COMP, order));
> + memalloc_folio_restore(flags);
> + return folio;
> }
> EXPORT_SYMBOL(folio_alloc_noprof);
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index ee902a468c2f..37434b37f7af 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5345,8 +5345,13 @@ EXPORT_SYMBOL(__alloc_pages_noprof);
> struct folio *__folio_alloc_noprof(gfp_t gfp, unsigned int order, int
> preferred_nid,
> nodemask_t *nodemask)
> {
> - struct page *page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
> + struct page *page;
> + int flags;
> +
> + flags = memalloc_folio_save();
> + page = __alloc_pages_noprof(gfp | __GFP_COMP, order,
> preferred_nid, nodemask);
> + memalloc_folio_restore(flags);
> return page_rmappable_folio(page);
> }
> EXPORT_SYMBOL(__folio_alloc_noprof);
> --
> 2.43.0
>
>
> --
> Cheers,
>
> David