Re: [RFC LPC2026 PATCH v2 00/11] Specific Purpose Memory NUMA Nodes

Next message: Mateusz Guzik: "Re: [syzbot] [ntfs3?] INFO: task hung in __start_renaming"
Previous message: Jorge Marques: "[PATCH v2 9/9] iio: adc: ad4062: Add GPIO Controller support"
In reply to: Alistair Popple: "Re: [RFC LPC2026 PATCH v2 00/11] Specific Purpose Memory NUMA Nodes"
Next in thread: Gregory Price: "Re: [RFC LPC2026 PATCH v2 00/11] Specific Purpose Memory NUMA Nodes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: David Hildenbrand (Red Hat)

Date: Mon Nov 24 2025 - 04:23:38 EST

[...]

2) The addition of `cpuset.mems.sysram` which restricts allocations to
`mt_sysram_nodes` unless GFP_SPM_NODE is used.

SPM Nodes are still allowed in cpuset.mems.allowed and effective.

This is done to allow separate control over sysram and SPM node sets
by cgroups while maintaining the existing hierarchical rules.

current cpuset configuration
cpuset.mems_allowed
|.mems_effective < (mems_allowed ∩ parent.mems_effective)
|->tasks.mems_allowed < cpuset.mems_effective

new cpuset configuration
cpuset.mems_allowed
|.mems_effective < (mems_allowed ∩ parent.mems_effective)
|.sysram_nodes < (mems_effective ∩ default_sys_nodemask)
|->task.sysram_nodes < cpuset.sysram_nodes

This means mems_allowed still restricts all node usage in any given
task context, which is the existing behavior.

3) Addition of MHP_SPM_NODE flag to instruct memory_hotplug.c that the
capacity being added should mark the node as an SPM Node.

Sounds a bit like the wrong interface for configuring this. This smells like a per-node setting that should be configured before hotplugging any memory.

A node is either SysRAM or SPM - never both. Attempting to add
incompatible memory to a node results in hotplug failure.

DAX and CXL are made aware of the bit and have `spm_node` bits added
to their relevant subsystems.

4) Adding GFP_SPM_NODE - which allows page_alloc.c to request memory
from the provided node or nodemask. It changes the behavior of
the cpuset mems_allowed and mt_node_allowed() checks.

I wonder why that is required. Couldn't we disallow allocation from one of these special nodes as default, and only allow it if someone explicitly passes in the node for allocation?

What's the problem with that?

--
Cheers

David