Re: [PATCH v3 3/4] mm/page_owner: add NUMA node filter with nodelist support
From: zhen.ni
Date: Thu Apr 30 2026 - 01:12:32 EST
在 2026/4/29 22:56, SeongJae Park 写道:
On Wed, 29 Apr 2026 17:03:56 +0800 "zhen.ni" <zhen.ni@xxxxxxxxxxxx> wrote:Concurrency Safety:
[...]
在 2026/4/29 09:28, SeongJae Park 写道:
On Tue, 28 Apr 2026 15:11:11 +0800 Zhen Ni <zhen.ni@xxxxxxxxxxxx> wrote:
The reason is that `owner_filter.nid_mask` is a nodemask_t, which is a@@ -685,6 +685,7 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
struct page_ext *page_ext;
struct page_owner *page_owner;
depot_stack_handle_t handle;
+ nodemask_t mask;
if (!static_branch_unlikely(&page_owner_inited))
return -EINVAL;
@@ -698,6 +699,8 @@ read_page_owner(struct file *file, char __user *buf, size_t count, loff_t *ppos)
while (!pfn_valid(pfn) && (pfn & (MAX_ORDER_NR_PAGES - 1)) != 0)
pfn++;
+ mask = owner_filter.nid_mask;
+
READ_ONCE() was used for owner_filter.print_mode. Should nid_mask also read
using READ_ONCE()?
128-byte structure. READ_ONCE() only supports types up to 8 bytes and
will trigger a compile-time assertion failure for larger structures.
This was actually an issue in v2 - the AI review tool (sashiko.dev) and
Andrew both caught the compilation error with READ_ONCE/WRITE_ONCE on
nodemask_t, so v3 removed them.
Thank you for kindly sharing the context. Now I understand why READ_ONCE()
cannot be used. But, is plain load/store safe enough for nodemask_t?
Shouldn't it still be protected against races?
I considered spinlock and RCU, but decided against them:
- Spinlock: Adds overhead on every read, overkill for a debug facility
- RCU: Requires dynamic allocation of 128-byte nodemask_t, too complex
- READ_ONCE/WRITE_ONCE: Not possible, exceeds 8-byte limit
Plain load/store is safe here because:
1. page_owner is debug code with low-frequency filter changes
2. Worst case of torn read is temporary inconsistency in debug output
3. Similar debugfs interfaces use the same approach
The overhead of locking doesn't justify the benefit for this debug use case.
Do you think this is acceptable, or would you prefer I add locking?
[...]I chose "-1" to clearly differentiate from valid NUMA node IDs (0, 1, 2, 3...).Since node IDs are non-negative integers, "-1" naturally means "invalid" or "no filter", which is an intuitive convention in Linux (e.g., pid -1, signal -1).
Yes, empty input (echo > nid) works because nodelist_parse() handles it+static ssize_t nid_filter_write(struct file *file,
+ const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ char *kbuf;
+ nodemask_t mask;
+ int ret;
+ int val;
+
+ /*
+ * Limit input size to handle worst-case nodelist (all nodes).
+ * Worst case per node: ",NNNNN" (comma + 5-digit node number) = 6 bytes.
+ * Formula: 100 bytes overhead + 6 * MAX_NUMNODES
+ */
+ if (count > (100 + 6 * MAX_NUMNODES))
+ return -EINVAL;
+
+ kbuf = kmalloc(count + 1, GFP_KERNEL);
+ if (!kbuf)
+ return -ENOMEM;
+
+ if (copy_from_user(kbuf, buf, count)) {
+ ret = -EFAULT;
+ goto out_free;
+ }
+ kbuf[count] = '\0';
+
+ /* Support: "-1" to clear, or nodelist format like "0", "0,2", "0-3" */
+ if (kstrtoint(kbuf, 10, &val) == 0 && val == -1)
+ nodes_clear(mask);
+ else if (nodelist_parse(kbuf, mask)) {
+ ret = -EINVAL;
+ goto out_free;
+ }
Doesn't empty string input to nodelist_parse() clears the mask? Can't it be
reused?
correctly. However, nodelist_parse() - which is implemented via
bitmap_parselist() - cannot handle "-1" as it's not a valid range format
and would return an error. The explicit "-1" check is necessary to
support `echo "-1" > nid` without returning an error.
So the "-1" check handles a case that nodelist_parse() cannot handle.
Thank you for kindly explaining the reason. But, do we really need to support
"-1" input? Couldn't we just redefine the interface?
Do you have a better suggestion for how to represent "clear filter"?
[...]
For consistency with the other file_operations+ debugfs_create_file("nid", 0600, filter_dir, NULL,
+ &nid_filter_fops);
Why don't you use 'page_owner_' prefix like other fops, for consistency?
in this module (page_owner_fops, page_owner_threshold_fops,
page_owner_print_mode_fops), I'll rename nid_filter_fops to
page_owner_nid_filter_fops.
I'll incorporate these improvements in the next version.
Thank you for kindly accepting my humble suggestions.
Thanks for the detailed review!
Thank you for sharing this nice work, too!
Thanks,
SJ
[...]