Re: [PATCH v5 2/3] mm/page_owner: add NUMA node filter with nodelist support

From: zhen.ni

Date: Sat May 09 2026 - 04:43:27 EST




在 2026/5/9 08:44, SeongJae Park 写道:
On Thu, 7 May 2026 14:46:42 +0800 Zhen Ni <zhen.ni@xxxxxxxxxxxx> wrote:

Add NUMA node filtering functionality to page_owner to allow filtering
pages by specific NUMA node(s). This is useful for NUMA-aware memory
allocation analysis and debugging.

The filter supports flexible nodelist input formats:
- Single node: echo "0" > nid
- Multiple nodes: echo "0,2,3" > nid
- Node range: echo "0-3" > nid
- Mixed format: echo "0,2-4,7" > nid
- Clear filter: echo > nid (empty string)

The implementation uses nodemask_t for efficient multi-node filtering
and nodelist_parse() for flexible input parsing. Empty input clears
the filter.

Note: Access to nid_mask uses plain load/store without locking because
nodemask_t is too large (128 bytes) for READ_ONCE/WRITE_ONCE. This is
safe for debug use: low-frequency changes and torn reads would only
cause temporary inconsistency in debug output.

Signed-off-by: Zhen Ni <zhen.ni@xxxxxxxxxxxx>
---

Changes in v5:
- Optimize nodes_empty() check in page iteration loop
- Add __data_racy qualifier to nid_mask field

Adding links to previous revisions [1] would be helpful.

Will add lore links.


---
mm/page_owner.c | 86 +++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)

diff --git a/mm/page_owner.c b/mm/page_owner.c
[...]
@@ -700,6 +707,9 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
pfn++;
+ mask = owner_filter.nid_mask;
+ bool filter_by_nid = !nodes_empty(mask);
+

Shouldn't we separate variable declarations and statements inside a same block?


I will fix this in v6 by declaring all variables at the beginning of the block:

nodemask_t mask;
bool filter_by_nid;

mask = owner_filter.nid_mask;
filter_by_nid = !nodes_empty(mask);
[...]
+static ssize_t nid_filter_write(struct file *file,
+ const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ char *kbuf;
+ nodemask_t mask;
+ int ret;
+
+ /*
+ * Limit input size to handle worst-case nodelist (all nodes).
+ * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
+ * Formula: 100 bytes overhead + 6 * MAX_NUMNODES

What is the 100 bytes overhead?

The 100 bytes is intended as a safety margin, but it's not strictly necessary.
Maybe I should simplify it to just 6 * MAX_NUMNODES?

+ */
+ if (count > (100 + 6 * MAX_NUMNODES))
+ return -EINVAL;
+
+ kbuf = kmalloc(count + 1, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;

Would it make sense to use kmalloc_objs()?

I'll update the code to use
kmalloc_objs(char, count + 1, GFP_KERNEL)

Thanks for the review!

[1] https://docs.kernel.org/process/submitting-patches.html#commentary


Thanks,
SJ

[...]


Best regards,
Zhen