[PATCH v8 2/4] mm/page_owner: add NUMA node filter

From: Zhen Ni

Date: Wed May 20 2026 - 07:39:57 EST


Add NUMA node filtering functionality to page_owner to allow filtering
pages by specific NUMA node(s). This is useful for NUMA-aware memory
allocation analysis and debugging.

The filter supports flexible input formats:
- Single node: nid=0
- Multiple nodes: nid=0,2,3
- Node range: nid=0-3
- Mixed format: nid=0,2-4,7

Example usage:
# Using the page_owner_filter tool (recommended)
./page_owner_filter -n 0-3
./page_owner_filter -m stack_handle -n 0,2-4,7

The implementation uses per-file-descriptor filter state stored in
file->private_data, allowing each opener to have independent filter
configuration. It uses nodemask_t for efficient multi-node filtering and
nodelist_parse() for flexible input parsing. Node validity is verified
using nodes_subset() to reject nodes without memory.

Signed-off-by: Zhen Ni <zhen.ni@xxxxxxxxxxxx>
---

Changes in v8:
- Add cond_resched() in page iteration loop to prevent RCU stalls
- Reject empty nid list to avoid enabling an empty filter
- Improve comment: "Commit all filter changes"

Changes in v7:
- per-file-descriptor implementation

Changes in v6:
- Add node validity check using nodes_subset
to reject invalid node numbers that don't exist in the system
- Move bool filter_by_nid declaration to top of block
- Use kmalloc_objs instead of kmalloc
- Remove 100 bytes overhead

Changes in v5:
- Optimize nodes_empty() check in page iteration loop
- Add __data_racy qualifier to nid_mask field

Changes in v4:
- Remove "-1" support, use empty string to clear filter
- Use strncpy_from_user() instead of copy_from_user()
- Add concurrency safety documentation for nid_mask access
- Rename fops to page_owner_nid_filter_fops for consistency

Changes in v3:
- Remove READ_ONCE/WRITE_ONCE for nodemask_t (fixes compilation errors)
* nodemask_t is a large structure (128 bytes) that triggers compile-time asserts
* Direct assignment is safe for this use case
- Add comment explaining input length calculation formula
* 6 bytes = ",NNNNN" (comma + 5-digit node number)
- Simplify "-1" check using kstrtoint() instead of dual strcmp()
- Move nodemask_t mask read outside PFN iteration loop for performance
* Avoids 128-byte structure copy on each iteration

Changes in v2:
- Use nodemask_t instead of int to support multiple nodes
- Implement nodelist_parse() to support flexible input formats
* Single node: "0", "2"
* Multiple nodes: "0,2,3"
* Ranges: "0-3"
* Mixed: "0,2-4,7"
- Use %*pbl format for output (e.g., "0-2", "0,2-4,7")
- Use dynamic memory allocation (kmalloc) to handle variable-length input
- Follow cpuset's max_write_len pattern: (100 + 6 * MAX_NUMNODES)

v7: https://lore.kernel.org/linux-mm/20260515091942.1535677-3-zhen.ni@xxxxxxxxxxxx/
v6: https://lore.kernel.org/linux-mm/20260511033017.747781-3-zhen.ni@xxxxxxxxxxxx/
v5: https://lore.kernel.org/linux-mm/20260507064643.179187-3-zhen.ni@xxxxxxxxxxxx/
v4: https://lore.kernel.org/linux-mm/20260430163247.13628-3-zhen.ni@xxxxxxxxxxxx/
v3: https://lore.kernel.org/linux-mm/20260428071112.1420380-4-zhen.ni@xxxxxxxxxxxx/
v2: https://lore.kernel.org/linux-mm/20260419155540.376847-4-zhen.ni@xxxxxxxxxxxx/
v1: https://lore.kernel.org/linux-mm/20260417154638.22370-4-zhen.ni@xxxxxxxxxxxx/
---
mm/page_owner.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++---
1 file changed, 47 insertions(+), 3 deletions(-)

diff --git a/mm/page_owner.c b/mm/page_owner.c
index d0c428d6cac3..59cfbc64a117 100644
--- a/mm/page_owner.c
+++ b/mm/page_owner.c
@@ -68,6 +68,8 @@ static const char * const page_owner_print_mode_strings[] = {

struct page_owner_filter_state {
enum page_owner_print_mode print_mode;
+ nodemask_t nid_filter;
+ bool nid_filter_enabled;
};

static bool page_owner_enabled __initdata;
@@ -767,6 +769,13 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
if (!handle)
goto ext_put_continue;

+ if (state->nid_filter_enabled) {
+ int page_nid = page_to_nid(page);
+
+ if (!node_isset(page_nid, state->nid_filter))
+ goto ext_put_continue;
+ }
+
/* Record the next PFN to read in the file offset */
*ppos = pfn + 1;

@@ -776,6 +785,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
&page_owner_tmp, handle, state);
ext_put_continue:
page_ext_put(page_ext);
+ if (need_resched())
+ cond_resched();
}

return 0;
@@ -883,6 +894,8 @@ static int page_owner_open(struct inode *inode, struct file *file)
return -ENOMEM;

state->print_mode = PAGE_OWNER_PRINT_STACK;
+ nodes_clear(state->nid_filter);
+ state->nid_filter_enabled = false;
file->private_data = state;
return 0;
}
@@ -903,12 +916,18 @@ static ssize_t page_owner_write(struct file *file,
int ret;
size_t max_input_len;
struct page_owner_filter_state *state = file->private_data;
+ enum page_owner_print_mode new_print_mode = state->print_mode;
+ nodemask_t new_nid_filter = state->nid_filter;
+ bool new_nid_filter_enabled = state->nid_filter_enabled;

/*
* Maximum input length for filter commands:
- * 32: print_mode command max length is 17 ("mode=stack_handle").
+ * - 32: print_mode command max length is 17 ("mode=stack_handle")
+ * with sufficient buffer
+ * - 6 * MAX_NUMNODES: worst case for nid list
+ * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes
*/
- max_input_len = 32;
+ max_input_len = 32 + 6 * MAX_NUMNODES;

if (count > max_input_len)
return -EINVAL;
@@ -928,13 +947,38 @@ static ssize_t page_owner_write(struct file *file,
token + 5);
if (ret < 0)
goto out_free;
- state->print_mode = ret;
+ new_print_mode = ret;
+ } else if (!strncmp(token, "nid=", 4)) {
+ ret = nodelist_parse(token + 4, new_nid_filter);
+ if (ret < 0)
+ goto out_free;
+
+ if (nodes_empty(new_nid_filter)) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ /*
+ * We want to filter memory allocations by numa nodes, so make sure
+ * that the specified nodes have memory.
+ */
+ if (!nodes_subset(new_nid_filter, node_states[N_MEMORY])) {
+ ret = -EINVAL;
+ goto out_free;
+ }
+
+ new_nid_filter_enabled = true;
} else {
ret = -EINVAL;
goto out_free;
}
}

+ /* Commit all filter changes */
+ state->print_mode = new_print_mode;
+ state->nid_filter = new_nid_filter;
+ state->nid_filter_enabled = new_nid_filter_enabled;
+
ret = count;

out_free:
--
2.20.1