Re: [PATCH v8 0/4] mm/page_owner: add per-fd filter infrastructure for print_mode and NUMA filtering

From: zhen.ni

Date: Fri May 22 2026 - 04:55:09 EST

在 2026/5/22 08:20, Andrew Morton 写道:

On Wed, 20 May 2026 15:56:37 +0800 Zhen Ni <zhen.ni@xxxxxxxxxxxx> wrote:

This patch series introduces per-file-descriptor filtering capabilities to the
page_owner feature.

Thanks. AI review appears to have found a bunch of new things to
complain about. Can you please check?

https://sashiko.dev/#/patchset/20260515091942.1535677-1-zhen.ni@xxxxxxxxxxxx

I'm interested in learning how much of this is accurate, and how useful
you're finding that review to be?

Thanks. This is a response to review feedback on patch 1/4
(mm/page_owner: add print_mode filter).

There's a race condition between
page_owner_write() and print_page_owner() when sharing the same
file descriptor. The sequential reads of state->print_mode can
indeed lead to inconsistent behavior if a concurrent write
changes the mode between the two checks.

Proposed solution: Add spinlock for proper protection

I'll add a spinlock to page_owner_filter_state to properly serialize
concurrent access:

struct page_owner_filter_state {
enum page_owner_print_mode print_mode;
nodemask_t nid_filter;
bool nid_filter_enabled;
spinlock_t lock; /* Protect concurrent access */
};

Write operation - atomic update:

static ssize_t page_owner_write(struct file *file,
const char __user *buf,
size_t count, loff_t *ppos)
{
struct page_owner_filter_state *state = file->private_data;
unsigned long flags;

// Parse input (without lock)
// ...

// Atomic commit
spin_lock_irqsave(&state->lock, flags);
state->print_mode = new_print_mode;
state->nid_filter = new_nid_filter;
state->nid_filter_enabled = new_nid_filter_enabled;
spin_unlock_irqrestore(&state->lock, flags);

return count;
}

Read operation - consistent snapshot:

print_page_owner(...) {
struct page_owner_filter_state *state = file->private_data;
enum page_owner_print_mode print_mode;
nodemask_t nid_filter;
bool nid_enabled;
unsigned long flags;

// Get consistent snapshot (cached once)
spin_lock_irqsave(&state->lock, flags);
print_mode = state->print_mode;
nid_filter = state->nid_filter;
nid_enabled = state->nid_filter_enabled;
spin_unlock_irqrestore(&state->lock, flags);

// Use snapshot for all checks
if (print_mode != PAGE_OWNER_PRINT_HANDLE) {
// Print stack
}

if (print_mode != PAGE_OWNER_PRINT_STACK) {
// Print handle
}
}

Why READ_ONCE/WRITE_ONCE is not sufficient:

While READ_ONCE/WRITE_ONCE can prevent data tears on individual
fields, they cannot guarantee atomic updates across multiple
fields.

I'll prepare a follow-up patch to add the spinlock protection.

-------------------------------------------------------------

This is a response to review feedback on patch 2/4
(mm/page_owner: add NUMA node filter).

Issue 1: page_to_nid() may trigger page poison check

Why page poison check can be triggered:
The root cause is the lockless page iteration. As noted in the comment at line 741-746:
/*
* Some pages could be missed by concurrent allocation or free,
* because we don't hold the zone lock.
*/
Without holding the zone lock, there's a race window where a page may be in an inconsistent state during concurrent allocation or free.

When page_to_nid(page) is called:
#define page_to_nid(page) memdesc_nid(PF_POISONED_CHECK(page)->flags)

PF_POISONED_CHECK checks if page->flags is in an inconsistent state during this race window. If detected, it triggers VM_BUG_ON_PGFLAGS(), causing a kernel crash.

Current problematic code (mm/page_owner.c:772-777):
if (state->nid_filter_enabled) {
int page_nid = page_to_nid(page); // May trigger VM_BUG_ON_PGFLAGS

if (!node_isset(page_nid, state->nid_filter))
goto ext_put_continue;
}

Fix: Use memdesc_nid(page->flags) to extract nid directly from flags, bypassing PF_POISONED_CHECK:
if (state->nid_filter_enabled) {
int page_nid = memdesc_nid(page->flags); // Direct access, no check

if (!node_isset(page_nid, state->nid_filter))
goto ext_put_continue;
}

Why the suggested pfn_to_nid() doesn't work:
I checked the implementation in include/linux/mmzone.h:2377-2381:
#ifdef CONFIG_NUMA
#define pfn_to_nid(pfn) \
({ \
unsigned long __pfn_to_nid_pfn = (pfn); \
page_to_nid(pfn_to_page(__pfn_to_nid_pfn)); \
})
#endif
On NUMA systems, pfn_to_nid() internally calls page_to_nid(), which still triggers PF_POISONED_CHECK. Therefore it doesn't solve the problem.

Issue 2: Concurrent access protection

This issue is already addressed in patch 1/4.

-------------------------------------------------------------

This is a response to review feedback on patch 3/4

Issue 1: isdigit() with potentially negative signed char Accepted.

I'll fix this by casting to unsigned char:
if (!isdigit((unsigned char)*p)) {
fprintf(stderr, "Error: Invalid character '%c' in nid_list\n",
*p);
return -1;
}
This ensures the value is in the valid range for ctype functions and avoids undefined behavior with non-ASCII input.

Issue 2: NID validation allows values > 65535

No change planned. The kernel already validates NID values. If an invalid NID is sent, the kernel will reject it with EINVAL.

Issue 3: Manual -h handling interferes with getopt

No change planned. This is intentional. The manual -h check at line 155-162 allows the usage message to be displayed even on older kernels without page_owner support or per-fd filtering. This helps users understand the tool's functionality without needing to consult separate documentation, even when running on kernels that don't support these features. The edge case of -o -h is extremely rare in practice (user wanting to write to a file named "-h"), and the benefit of always-showable help documentation outweighs this theoretical issue.

Issue 4: Error message doesn't distinguish permissions from kernel support

No change planned. Harmless.

Issue 5: Performance of fflush() and fprintf() in read loop

I'll change to use fwrite() instead of fprintf() and move the flush to after the loop:

Current code:
while ((ret = read(fd, buf, sizeof(buf) - 1)) > 0) {
buf[ret] = '\0';
fprintf(output, "%s", buf);
fflush(output);
}
Changed to:
while ((ret = read(fd, buf, sizeof(buf))) > 0) {
fwrite(buf, 1, ret, output);
}
fflush(output);

This avoids the redundant strlen() scan from fprintf() and frequent flush calls, while still ensuring all data is written before the program continues.

Best regards,
Zhen Ni