Re: [PATCH v6 2/3] mm/page_owner: add NUMA node filter with nodelist support
From: Oscar Salvador
Date: Mon May 11 2026 - 05:05:35 EST
On Mon, May 11, 2026 at 11:30:16AM +0800, Zhen Ni wrote:
> Add NUMA node filtering functionality to page_owner to allow filtering
> pages by specific NUMA node(s). This is useful for NUMA-aware memory
> allocation analysis and debugging.
>
> The filter supports flexible nodelist input formats:
> - Single node: echo "0" > nid
> - Multiple nodes: echo "0,2,3" > nid
> - Node range: echo "0-3" > nid
> - Mixed format: echo "0,2-4,7" > nid
> - Clear filter: echo > nid (empty string)
>
> The implementation uses nodemask_t for efficient multi-node filtering
> and nodelist_parse() for flexible input parsing. Empty input clears
> the filter.
>
> Note: Access to nid_mask uses plain load/store without locking because
> nodemask_t is too large (128 bytes) for READ_ONCE/WRITE_ONCE. This is
> safe for debug use: low-frequency changes and torn reads would only
> cause temporary inconsistency in debug output.
>
> Signed-off-by: Zhen Ni <zhen.ni@xxxxxxxxxxxx>
> ---
...
> ---
> mm/page_owner.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 92 insertions(+)
>
> diff --git a/mm/page_owner.c b/mm/page_owner.c
> index 27a412c52d41..8a38005539ff 100644
> --- a/mm/page_owner.c
> +++ b/mm/page_owner.c
...
> @@ -700,6 +708,9 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
> while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
> pfn++;
>
> + mask = owner_filter.nid_mask;
> + filter_by_nid = !nodes_empty(mask);
> +
> /* Find an allocated page */
> for (; pfn < max_pfn; pfn++) {
> /*
> @@ -732,6 +743,14 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
> if (unlikely(!page_ext))
> continue;
>
> + /* NUMA node filter using bitmask */
> + if (filter_by_nid) {
This comment is kinda pointless because it explains something that the code makes it
quite clear.
Either drop it, or just go with "NUMA node filter", but "using bitmask"
does not really add much.
> + int nid = page_to_nid(page);
> +
> + if (!node_isset(nid, mask))
> + goto ext_put_continue;
> + }
> +
> /*
> * Some pages could be missed by concurrent allocation or free,
> * because we don't hold the zone lock.
> @@ -1043,6 +1062,77 @@ static const struct file_operations page_owner_print_mode_fops = {
> .llseek = default_llseek,
> };
>
> +static ssize_t nid_filter_write(struct file *file,
> + const char __user *buf,
> + size_t count, loff_t *ppos)
> +{
> + char *kbuf;
> + nodemask_t mask;
> + int ret;
> +
> + /*
> + * Limit input size to handle worst-case nodelist (all nodes).
> + * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
> + */
> + if (count > (6 * MAX_NUMNODES))
> + return -EINVAL;
> +
> + kbuf = kmalloc_objs(*kbuf, count + 1);
> + if (!kbuf)
> + return -ENOMEM;
> +
> + if (strncpy_from_user(kbuf, buf, count) < 0) {
> + ret = -EFAULT;
> + goto out_free;
> + }
> + kbuf[count] = '\0';
> +
> + /* Support nodelist format like "0", "0,2", "0-3", or empty to clear */
> + if (nodelist_parse(kbuf, mask)) {
> + ret = -EINVAL;
> + goto out_free;
> + }
nodelist_parse() can also return other return values besides EINVAL.
Something like
ret = nodelist_parse(...)
if (ret < 0)
return ret
might be cleaner.
> +
> + /* Validate that all specified nodes actually exist in the system */
> + if (!nodes_subset(mask, node_states[N_MEMORY])) {
> + ret = -EINVAL;
> + goto out_free;
> + }
Ok, I get that since you want to filter allocations by numa nodes, you
want to make sure that those nodes have memory.
Although that might change due to concurrent memory-hotplug operations,
but that is a different story.
I do not like the comment though, because we can have other nodes
existing in the system with no memory (e.g: memoryless nodes only having
cpus, or none of them), so I would make that clearer:
"
/*
* We want to filter memory allocations by numa nodes, so make sure
* that the specified nodes have memory.
*/
"
or something along those lines.
> +
> + owner_filter.nid_mask = mask;
> + ret = count;
> +
> +out_free:
> + kfree(kbuf);
> + return ret;
> +}
> +
> +static int nid_filter_show(struct seq_file *m, void *v)
> +{
> + nodemask_t mask = owner_filter.nid_mask;
> +
> + if (nodes_empty(mask))
> + seq_puts(m, "\n");
> + else
> + seq_printf(m, "%*pbl\n", nodemask_pr_args(&mask));
is not nodemask_pr_args clever enough to not print anything or print "0"
if the nmask is NODE_MASK_NONE?
--
Oscar Salvador
SUSE Labs